Thursday, September 29, 2011

Amazon Silk: split browser architecture

Amazon Silk

Content Delivery Networks, and WAN optimization, provided a generic acceleration solution to get common content closer to the client device, but on mobile devices the delivery performance of the last mile was still a problem. Many websites still do not have mobile optimized content, and sucking down a 3Mpixel JPG and render it on a 320x240 pixel display is just plain wrong. With the introduction of Amazon Silk, which uses the cloud to aggregate, cache, precompile, and predict, the client-side experience can now be optimized for the device that everybody glamours for: the tablet.

This is going to create an even bigger disconnect between the consumer IT experience and the enterprise IT experience. On the Amazon Fire you will be able to pull up, nearly instantaneously, common TV video clips and connect to millions of books. But most enterprises will find it difficult to invest in WAN optimization gear that would replicate that experience on the corporate network for your day to day work.

Amazon Silk is another example of the power that the cloud provides for doing heavy computes and caching that enables low-capability devices to roam.

Wednesday, September 14, 2011

Trillion Triple Semantic Database

The Semantic Web captures the semantics, or meaning, of data, and machines are enabled to interact with that meta data. It is an idea of WWW pioneer Tim Berners-Lee who observed that although search engines index much of the Web's content, keywords can only provide an indirect association to the meaning of the article's content. He foresees a number of ways in which developers and authors can create and use the semantic web to help context-understanding programs to better serve knowledge discovery.

Tim Berners-Lee originally expressed the vision of the Semantic Web as follows:
I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The intelligent agents people have touted for ages will finally materialize.

The world of semantic databases just got a little bit more interesting with the announcement by Franz, Inc. and Stillwater SC of having reached a trillion triple semantic data store for telecommunication data.

The database was constructed with an HPC on-demand cloud service and occupied 8 compute servers and 8 storage servers. The compute servers contained dual socket Xeons with 64GB of memory connecting through an QDR IB network to a 300TB SAN. The trillion triple data set spanned roughly 100TB of storage. It took roughly two weeks to load the data, but after that database provided interactive query rates for knowledge discovery and data mining.

The gear on which this result was produced is traditional HPC gear that emphasizes scalability and low latency interconnect. As a comparison, a billion triple version of the database was created on Amazon Web Services but the performance was roughly 3-5x slower. To create a trillion triple semantic database on AWS would have cost $75k and would have taken 6 weeks to complete.

Monday, July 4, 2011

What would you do with infinite computes?

Firing up a 1000 processor deep analytics cluster in the cloud to solve a market segmentation question regarding your customer orders during Christmas 2010, or a sentiment analysis of your company's facebook fan page now costs less than having lunch in Palo Alto.

The cloud effectively provides infinite computes, and to some degree infinite storage, although the costs of non-ephemeral storage might murk that analogy up a bit. So what would you do differently now you have access to a global supercomputer?

When I pose this question to my clients, it quickly reveals that their business processes are ill-prepared to take advantage of this opportunity. We are roughly half a decade into the cloud revolution, and at least a decade into the 'competing on analytics' mind set, but the typical enterprise IT shop is still unable to make a difference in the cloud.

However, change may be near. Given the state of functionality in software stacks like RightScale and Enstratus we might see a discontinuity in this inability to take advantage of the cloud. These stacks are getting to the point that an IT novice is able to provision complex applications into the cloud. Supported by solid open source provisioning stacks like Eucalyptus and, building reliable and adaptive software service stacks in the cloud is becoming child's play.

What I like about these environment is that they are cloud agnostic. For proper DR/BPC a single cloud provider would be a single point of failure and thus a non-starter. But these tools make it possible to run a live application across multiple cloud vendors thus solving the productivity and agility requirements that come with the territory of an Internet application.