29/04/2006 | jiangws2002 Scott’s “SiteExperts” Place: What we learned building Live.com (or why are we slow)?: Over the last 18 months, we have explored how to build a highly-interactive, customizable, and extensible portal. The first iteration was a simple portal on Start.com. We created various iterations (start.com/1, start.com/2, start.com/3) which have evolved into the first fully extensible portal on Live.com. Today, from Live.com to the Kahuna (Hotmail) beta to MSN Spaces, and so-on, we are investing heavily in building very rich, interactive experiences. With these investments, we are learning a great deal on the right and wrong ways to engineer rich, interactive web-sites. Underneath all our Windows Live properties, we share a common framework for how we engineer our client experiences. The framework is very client-centric where we composite most of the page client-side. For example, if you were to view Live.com’s HTML, you will notice that it serves a web-page “shell” and meta-data that describes the content. This meta-data is interpreted by the Live.com application and then rendered. This approach is extraordinarily flexible as we can quickly enhance and extend the application without any heavy server lifting. However, as we have learned and is being evidenced by our customers, without care, performance degradation can quickly outweigh all other benefits. As you examine Live.com, and for the technically savvy, explore the underlying browser technology, it is easy to question whether we as Microsoft and the industry as a whole are pushing the browser too far. The current crop of rich applications while cool and interactive are starting to fare poorly performance-wise against their traditional brethren. So much so that an often posed question is, where do we go from here? And is it time to reexamine building rich applications? First, let’s step back and very briefly look at the web versus traditional software. With traditional software, you would go through various design phases starting from specifications, to architecture documentation, to development, to usability testing, to testing, and eventually to ship. The entire approach had a fairly long lead time (up to years). Once released, updating the software was difficult and many times very cost prohibitive. This created a very high-bar. The web has removed almost all those barriers. On the web, we can now experiment and develop software with near real-time feedback and very fast release cycles. I view the Web 2.0 phenomena as being very early in the development lifecycle. I am not prepared to dismiss any approach, pattern, or methodology as we are still in the learning phases. In the case of Windows Live, as we push the browser, we are also learning a great deal. Examining performance specifically – when I look at Live.com today, I see incredible innovation. We are pushing the limits of extensibility (gadgets), reuse (shared frameworks across all our properties), and are taking chances to drive new user-experience standards (look at how we present search results). On the contrary, I also see an application whose performance is starting to become painful to use. The page currently takes a long time to load, especially on the first visit. Beyond our user’s feedback (we do read all messages), broader industry pundits are quick to throw in the towel on the entire technology. We are taking a different approach to this problem. We are challenging ourselves to prove that we can architect a performance driven, rich extensible experience. We are leveraging our gained knowledge shipping the many iterations of Start.com and all the beta products we are developing to improve our shared architecture and drive best patterns. Using the current Live.com as a simple case study, below I illustrate a few of the performance-oriented technical issues that we are working on quickly addressing: Manage your Connections Carefully If you were to examine Live.com at the network level, a reasonable person would quickly conclude that we are making too many connections. We decomposed this issue as follows: First, we are hitting an IE 6 issue that causes un-cached images applied dynamically via script to download on each reference. This issue typically manifests itself on slower connections – the time when bandwidth is most at a premium. We are baking a solution to this problem directly into our frameworks so that we pre-cache dynamically applied images before reuse. Next, every RSS feed and Gadget manifest is a unique request. This creates a web-page that is very “chatty” – an AJAX characteristic that you should work to avoid. We will solve this by intelligently batching multiple requests into a single request (and are exploring even more efficient means for the longer-term) Parsing XML is Slow We have also learned that merely parsing the RSS XML can be expensive in the browser. When we parse RSS, we are merely translating it into Javascript structures to be further manipulated. Since our server’s are already normalizing feeds to a standard format, instead of serving the RSS feed directly, we are going to translate the feed directly to JSON (Javascript structures). As a simple benchmark, on my fast developer machine we went from 400ms to parse 150K of RSS to 15ms to “execute” the JSON response. Caching and Connection Management is Essential Network bandwidth and connections are a scarce resource. Managing them is essential to providing fast experiences (especially for subsequent loads). We are evaluating the optimal approaches for splitting resources across multiple servers to leverage as many simultaneous connections at once (the browser is limited to 2 active connections per domain). We are also reviewing all resources to make sure they are set with appropriate expirations. In general, almost all content should have expiration, even a very short one (e.g., if the user leaves the page and hits the back button, the page should be re-rendered entirely from the cache). Staging the Application One of the biggest challenges with a very rich web-application is deploying the code. The richer the site, the more code that is needed. In the case of Live.com today, the entire application deploys before anything renders. However, our underlying framework supports dynamic and prioritized deployment – we just were not properly leveraging it. We are now focusing leveraging this pattern so that we can “stage” the application. We will be able to deploy the minimal code necessary to retrieve content and render the page and subsequently download features in priority order (e.g., render, then get drag-drop code, then get the RSS image rotator code, etc). For features not yet in use or visible, we can deploy those last or even on demand. Staging an application is fundamental to maintaining high-degrees of perceived performance. Server versus Client Rendering Traditional web-pages are generated via the server. Live.com and many of our properties are very client-centric where the client constructs the web-page from the user’s meta-data. In general, the first time a web-page loads, a server-generated page will almost always be faster. However, with a properly architected web application, we are discovering that our subsequent loads on a client-side generated page (especially when we stage the application) can be much faster than a server-oriented page. This occurs because our client-oriented approach is highly cacheable and loads asynchronously, even off the cache. The only content we need to download is the user’s meta-data. The rest of the page, scripts, and behaviors are cached indefinitely. This summarizes a few of the performance-driven challenges we are addressing. Internally, we have developed a complete prototype that validates we can build a high-performance, scalable version of Live.com that load and run near-instantly to within seconds (on broadband). We are working on integrating these improvements into the shipping site as quickly as possible. Over the coming weeks and months, expect to see continuous noticeable improvements. Update: Beyond performance, the Live.com team just posted the goals for the Live.com page.