Wim Leers: High concurrency Composer

On behalf of Acquia I’m currently working on Drupal’s next big leap: Automatic Updates & Project Browser — both are “strategic initiatives”.

In November, I started helping out the team led by Ted Bowman that’s been working on it non-stop for well over 1.5 years (!): see d.o/project/automatic_updates. It’s an enormous undertaking, with many entirely new challenges — as this post will show.

For a sense of scale: more people of Acquia’s “DAT” Drupal Acceleration Team have been working on this project than the entire original DAT/OCTO team back in 2012!

The foundation for both will be the (API-only, no UI!) package_manager module, which builds on top of the php-tuf/composer-stager library. We’re currently working hard to get that module committed to Drupal core before 10.1.0-alpha1.

Over the last few weeks, we managed to solve almost all of the remaining alpha blockers (which block the core issue that will add package_manager to Drupal core, as an alpha-experimental module. One of those was a random test failure on DrupalCI, whose failure frequency was increasing over time!

A rare random failure may be acceptable, but at this point, ~90% of test runs were failing on one or more of the dozens of Kernel tests … but always a different combination. Repeated investigations over the course of a month had not led us to the root cause. But now that the failure rate had reached new heights, we had to solve this. It brought the team’s productivity to a halt — imagine what damage this would have done to Drupal core’s progress!

A combination of prior research combined with the fact that suddenly the failure rate had gone up meant that there really could only be one explanation: this had to be a bug/race condition in Composer itself, because we were now invoking many more composer commands during test execution.

Once we changed focus to composer itself, the root cause became obvious: Composer tries to ensure the temporary directory is writable and avoids conflicts by using microtime(). That function confusingly can return the time at microsecond resolution, but defaults to mere millisecondssee for yourself.

With sufficiently high concurrency (up to 32 concurrent invocations on DrupalCI!), two composer commands could be executed on the exact same millisecond:

// Check system temp folder for usability as it can cause weird runtime issues otherwise Silencer::call(static function () use ($io): void { $tempfile = sys_get_temp_dir() . '/temp-' . md5(microtime()); if (!(file_put_contents($tempfile, __FILE__) && (file_get_contents($tempfile) === __FILE__) && unlink($tempfile) && !file_exists($tempfile))) { $io->writeError(sprintf('PHP temp directory (%s) does not exist or is not writable to Composer. Set sys_temp_dir in your php.ini', sys_get_temp_dir())); } }); src/Composer/Console/Application.php in Composer 2.5.4

We could switch to microtime(TRUE) for microseconds (reduce collision probability 1000-fold) or hrtime() (reduce collision probability by a factor of a million). But more effective would be to avoid collisions altogether. And that’s possible: composer always runs in its own process.

Simply changing sys_get_temp_dir() . '/temp-' . md5(microtime()); to sys_get_temp_dir() . '/temp-' . getmypid() . '-' . md5(microtime()); is sufficient to safeguard against collisions when using Composer in high concurrency contexts.

So that single line change is what I proposed in a Composer PR a few days ago. Earlier today it was merged into the 2.5 branch — meaning it should ship in the next version!

Eventually we’ll be able to remove our work-around. But for now, this was one of the most interesting challenges along the way :)

PubDate

Tags