Some time ago I started collecting the page performance data using boomerang.js. As a data storage I simply used access logs. The result is quite predictible. Most of the load time is spent on frontend (except page 10.html, where are some computation).
Firstly I wrote a Cascading job.
Later I rewrote it for Scalding. As you can see it, it is much shorter and more concise.