From May 13th to May 19th, the Helix Publish service wasn’t working as expected and failed to create new log configurations for Fastly service configurations that have not been used with Helix before. The failure of the Helix Publish service lead to Helix CLI aborting the attempt to publish the site, making it impossible to publish new sites. If you have been trying and failing to launch a site on Helix last week, we are sorry and we apologize for making it impossible to start new sites with Helix.
Helix Publish is using Google Cloud Platform IAM to create service accounts and service account keys on behalf of each Fastly service configuration that is published through Helix. Google Cloud Platform has a limit of 100 service accounts per account, and deleted service accounts count against this quota for up to 30 days after the deletion of the account.
On May 13th our integration tests started failing, indicating that service accounts can no longer be created, due to an exceeded quota. We contacted Google Cloud Platform Support to understand the issue (quota counts deleted accounts, too) and to create a resolution (increase service account quota). Upon resolution of the underlying issue, the service resumed operations and tests completed again.
1. establish automated monitoring of the Helix Publish Service
2. give more team members access to the Google Cloud Platform support account
3. use separate accounts for integration tests and production