Thursday, September 29, 2011

Amazon Silk: split browser architecture

Amazon Silk

Content Delivery Networks, and WAN optimization, provided a generic acceleration solution to get common content closer to the client device, but on mobile devices the delivery performance of the last mile was still a problem. Many websites still do not have mobile optimized content, and sucking down a 3Mpixel JPG and render it on a 320x240 pixel display is just plain wrong. With the introduction of Amazon Silk, which uses the cloud to aggregate, cache, precompile, and predict, the client-side experience can now be optimized for the device that everybody glamours for: the tablet.

This is going to create an even bigger disconnect between the consumer IT experience and the enterprise IT experience. On the Amazon Fire you will be able to pull up, nearly instantaneously, common TV video clips and connect to millions of books. But most enterprises will find it difficult to invest in WAN optimization gear that would replicate that experience on the corporate network for your day to day work.

Amazon Silk is another example of the power that the cloud provides for doing heavy computes and caching that enables low-capability devices to roam.

Wednesday, September 14, 2011

Trillion Triple Semantic Database

The Semantic Web captures the semantics, or meaning, of data, and machines are enabled to interact with that meta data. It is an idea of WWW pioneer Tim Berners-Lee who observed that although search engines index much of the Web's content, keywords can only provide an indirect association to the meaning of the article's content. He foresees a number of ways in which developers and authors can create and use the semantic web to help context-understanding programs to better serve knowledge discovery.

Tim Berners-Lee originally expressed the vision of the Semantic Web as follows:
I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The intelligent agents people have touted for ages will finally materialize.

The world of semantic databases just got a little bit more interesting with the announcement by Franz, Inc. and Stillwater SC of having reached a trillion triple semantic data store for telecommunication data.

http://www.franz.com/about/press_room/trillion-triples.lhtml

The database was constructed with an HPC on-demand cloud service and occupied 8 compute servers and 8 storage servers. The compute servers contained dual socket Xeons with 64GB of memory connecting through an QDR IB network to a 300TB SAN. The trillion triple data set spanned roughly 100TB of storage. It took roughly two weeks to load the data, but after that database provided interactive query rates for knowledge discovery and data mining.

The gear on which this result was produced is traditional HPC gear that emphasizes scalability and low latency interconnect. As a comparison, a billion triple version of the database was created on Amazon Web Services but the performance was roughly 3-5x slower. To create a trillion triple semantic database on AWS would have cost $75k and would have taken 6 weeks to complete.