This week, Microsoft’s Linux package repositories suffered an hours-long outage, followed by performance issues spanning over a day.
Users relying on the packages.microsoft.com repository to pull Linux distributions, including Ubuntu, Debian, CentOS, OpenSUSE, and Fedora received errors.
Microsoft engineers have acknowledged the issue and are working towards a resolution.
Microsoft’s Linux repos go down in an outage
The packages.microsoft.com repository went down this week in a prolonged outage.
Linux & Solaris specialist, Štefan Jarina first brought up the issue on June 16th, on getting a bunch of “404 not found” errors when downloading “.deb” files from the repository.
Jarina’s report was then confirmed by other engineers experiencing the issue, with some seeing “500 Internal Server Error” messages when trying to pull Debian packages.
Microsoft engineer Rahul Bhandari stepped in on the same GitHub thread to confirm:
“Our infra team is working on this. There is an issue with some of the mirrors on packages.microsoft.com so as per them, the current ETA to resolve this issue is in next two hours or so,” said Bhandari.
Bhandari later confirmed that some storage issues were the root cause of these problems.
While the issue was being investigated, several users asked for an “incidence response report,” as to why the mirror sites had also gone down in this outage, and why was this a recurring issue.
“Will there be an incident report in response to this? I’d be particularly interested in why mirror sites were not available or if available, why there is a single point of failure affecting them all.”
“We’ve faced issues in the past where packages would fail when a deployment was running but a cataclysmic failure of this nature would have affected many production workloads today.”
“Package managers are the backbone of our industry and we need to be able to rely on them.”
“I’ve been forced to remove reliance on Microsoft package repos in favour of self hosted ones for the time being which is unnecessary manual maintenance I’d like to avoid if possible,” stated engineer Michael Armitage.
Repos up, but users experience degraded performance
Although Microsoft’s initial ETA to resolve the issue was “two hours or so,” the problem spanned well over 14 hours, with users continuing to experience degraded performance.
Microsoft’s principal engineering manager, Ravindra Bhartiya said:
“We had an incident with packages.microsoft.com that resulted in packages being unavailable.”
“Our engineering team has mitigated the issue and our internal data shows improvement in the availability”
“If you still have problems, please provide us more information (output of “apt-get update|install”) and we can investigate it further,” said Bhartiya.
But even into today, at the time of writing, users are complaining about slow download speeds when retrieving packages from Microsoft’s repos:
Some downloads reportedly took over two or three to complete, urging users to investigate alternative solutions. Although it appears the performance and availability are slowly improving and returning to normal.
Large-scale outages of critical systems and CDNs have become a common occurrence lately.
Interestingly, the timing of this outage coincides with Akamai’s outage that impacted major Australian banks and organizations yesterday, even though the two incidents seem unrelated.