The HTTP Archive is an open source project that tracks how the web is built. Twice a month it crawls 1.3 million web pages on desktop and emulated mobile devices, and collects technical information about each of the web pages. That information is then aggregated and made available in curated reports. The raw data is also made available via Google BigQuery, which makes answering interesting questions about the web accessible to anyone with some knowledge of SQL as well as the curiosity to dig in.
When Steve Souders created the project back in 2010, it included far less pages - but it was immensely valuable to the community. As sponsorship increased so did the infrastructure and the ability to do more with it. Over time more and more information was added to the archive - including HAR files, Lighthouse reports and even response bodies.
In 2017 Ilya Grigorik, Patrick Meenan and Rick Viscomi started maintaining the project. They have done some amazing work overhauling the new website, creating new and useful reports and continuing to push the envelope on what the HTTP Archive is capable of providing to the web community. As of last week I've joined Ilya, Pat and Rick as a co-maintainer of the HTTP Archive, and I couldn't be more excited!
To read more about how have I been using the HTTP Archive, you can read my full blog post here.
If you are interested in reading more about the type of analysis being done with the HTTP Archive, you can find numerous examples in the HTTP Archive discussion forums - https://discuss.httparchive.org/.