Wednesday, December 2, 2009

Government as a Platform

I saw a tweet fly by from Tim O'Reilly with the above label. When googling the matter, I came across a presentation Tim gave this summer exploring the new universe of APIs that enable access to governmental data. Combining this data with operational pieces in a typical CRM or ERP database would constitute a powerful means for ERP optimization.

Here is the presentation:

One of the tidbits of info that jumped out for me was that 45% of all map mashups are based on Google maps, but only 4% on Microsoft virtual world. That is the power of first mover advantage. Given the push towards light-weight federated systems, the economic lock-in of the big four, IBM, Microsoft, Oracle, and SAP is finally broken. Agility is king again. For example, look at this of

Saturday, November 28, 2009

And now something completely different: brain simulation

Our good folks at the national labs have been developing cloud computing for two decades, so what they are doing now is possibly an indication of what we'll be doing with the cloud a decade from now. Researchers at IBM Almaden have been working on the largest brain simulation to date; 1.6 billion neurons with 9 trillion connections. The scale of this endeavor still dwarfs the capacity and capability of any commercial cloud offering; the simulation uses roughly 150 thousand processors and 150TBytes of memory.

Just to provide a sense of the OPEX of such an installation: Dawn, a IBM Blue Gene/P supercomputer at LLNL, hums and breathes inside an acre-size room on the second floor of the lab's Terascale Simulation Facility. Its 147,456 processors and 147,000 gigabytes of memory fill 10 rows of computer racks, woven together by miles of cable. Dawn devours a million watts of electricity through power cords as thick as a bouncer's wrists—racking up an annual power bill of $1 million. The roar of refrigeration fans fills the air: 6675 tons of air-conditioning hardware labor to dissipate Dawn's body heat, blowing 2.7 million cubic feet of chilled air through the room every minute.

Given the fact that a real brain only consumes about 25Watts, clearly there is a lot of room for technology innovation. Silicon innovation however has come to a stand still with venture capital completely abandoning this segment. There are no VC firms in the US or EU that have any funds that target this vertical. It is rumored that Google is designing its own silicon now since no commercial chip manufacturers are providing the innovation that Google needs.

Friday, November 20, 2009

Governmental IT: Analytics is not a dirty word

Over at Smart Data Collective, Bill Cooper wrote a wonderful article on deep analytics. In particular, I liked his assessment on the resistance expressed by customers that can't see the forest for the trees.

I’ve watched many government agencies balk at the idea of data mining and complex analytics. They are concerned about switching to a new data architecture and the potential risks involved in implementing a new solution or making a change in methodology.

Having been there, I do understand their concerns, but fear of change is what’s holding government agencies back from being able to fully leverage the data that already exists to effect change at the local, regional, state and national levels. Analytics are the key to lowering costs, increasing revenue and streamlining government programs.

In my own government experience and now, looking at it from the other side, I have come to believe that government clients need to think about data the way the world’s top corporations do. Like all federal agencies, these companies already had huge repositories of data that were never analyzed – never used to support decisions, plan strategies or take immediate actions. Once they began to treat that data as a corporate asset, they started to see real results. The best part is that leveraging these mountains of data does not require a "rip and replace" approach. Inserting a data warehousing/data mining or complex analytics capability into a SOA or cloud computing environment can be very low risk and even elegant in its implementation. The potential rewards are immense!

That’s what’s needed in the government sector. We need to view analytics not as a dirty word but as a secret weapon against fraud and other challenges impacting all areas of the government sector.

I am a big believer that the future of the cloud consists of federated systems for the simple reason that large data sets are captive to their storage devices. Federated systems makes service oriented architectures (SOA) a natural architecture pattern to collate information. The fact that Google Gears and Microsoft Azure exhibit SOA at different levels of abstraction is clear evidence of the power of SOA. Add coarse grain SOAs to these fine-grained patterns and you can support federation and scale internal IT systems even if the core runs in Gears or Azure.

Interactive Map of cloud services

Appirio, a company that helps enterprise customers leverage PaaS cloud platforms such as and Google Apps, put a nice interactive navigator on their website.
The Appirio cloud computing ecosystem map aims to provide more clarity in the fast evolving cloud services market. It tries to help enterprise decision makers to accelerate their adoption of the cloud by trying to provide a standard taxonomy.

Ryan Nichols, head of cloud strategy at Appirio, states: "The cloud ecosystem is evolving so quickly that it's difficult for most enterprises to keep up. We created the ecosystem map to track this evolution ourselves, and have decided to publish it to help others assess the lay of the land. With broader community involvement, we can create a living, breathing map where anyone can access, drill down and interact with dynamic information. This will bring some much-needed clarity to the cloud market."

Unfortunately, since the map is geared towards the enterprise customer it ignores all of the innovation that is taking place in the mashup, programmable web, and mid-market products, such as Zementis ADAPA in the Cloud. Given the new ways in which the cloud enables new application architectures and services, the enterprise market is the worst indicator of the evolving cloud ecosystem.

Monday, November 2, 2009

PC sales decline

In his post PCs at a Crossroads Michael Friedenberg reports on IDC's measurement of the PC marketplace. From the article:

"Case in point is the PC market. Market researcher IDC reports that 2009 will be the first year since 2001 where PC shipments will decline. I believe this drop is driven by a more rapid intersection of the cyclical and the systemic as the PC value proposition is challenged and then transformed. As Intel CEO Paul Otellini recently said, "We're moving from personal computers to personal computing." That comment signals Intel's way of moving into new markets, but it also acknowledges that the enterprise PC market has arrived at a crossroads."

Cloud computing is one development that is dramatically changing the desktop and server eco-system. The performance of a single server or desktop hasn't kept pace with the computational needs of modern science, engineering, or business. Cloud computing moves away from capital equipment to the ability to procure just the computational output AND at infinite scale for most use cases. Most desktops are idling most of the time, but are too slow to get real work done when you need it. This is pushing the work towards elastic resources that are consumed as you go. If the browser is all you need, then a move towards server consolidation and thin clients is not far behind.

Thursday, August 27, 2009

Amazon Virtual Private Cloud

Yesterday, Amazon introduced Amazon VPC. It enables logically isolated compute instances and a VPN tunnel to connect to internal data center resources. The architecture is straight forward and Amazon's blog post depicts is as follows.

But the implications of VPC are far reaching; there are no real hurdles left to leverage Amazon's cloud except for limited and costly Internet bandwidth. Amazon's offering is morphing into a very flexible IaaS with some content delivery network features that are great for geographically dispersed small businesses. I am thinking particularly companies such as Stillwater that are differentiating through high domain expertise. Global talent cannot be bound to a small locale such as Silicon Valley anymore. We are connecting researchers in US, EU, Middle East, and Asia and with offerings like this we can create a development process that follows the moon that rivals the mega-vendor infrastructures. We do not need to uproot any folks to make this happen. These are exciting times we live in that can really unleash the creative spirit of the world.

Saturday, July 18, 2009

Cloud Computing Taxonomy

I found this wonderful graphic created by Peter Laird in his blog.

Peter's blog has all the descriptions of the buckets.

The Public Cloud bucket is heavily underreported. There are roughly about 1200 public data centers in the US alone that are quite happy to rent you a server or cabinet. There are a host of data center market places that will connect you to a data center provider. Here are a few:

Find a Data Center

Data Center Knowledge

Data Center Marketplace

In particular, the telecom companies, like 365 Main, SuperNAP, Qwest, Verizon, Level 3 Communications are quite happy to sell you connectivity AND servers and are perfect for large cloud deployments that need geographic spread and high bandwidth.

Tuesday, July 14, 2009

On-demand pricing for Windows Azure

InformationWeek's Paul McDougall reports on Windows Azure pricing and it provides confirmation that Microsoft is transitioning its boxed software business into a service business.

Paul's assessment:
"Azure is the latest sign that Microsoft is eyeing the Web as the primary delivery mechanism for software and services. On Monday, the company said it planned to make a version of Microsoft Office 2010 available to consumers over the Internet at no charge. It plans a similar offering for businesses."

In my mind there is still one piece missing for productive cloud computing and that is the seamless integration of the client. None of the big vendors are particularly keen on solving this problem since it diminishes their economic lock-in. But users create, use, and transform data and information on their clients and data needs to seamlessly flow between the client and the cloud. This flow in my mind is best managed by the OS, or a tightly integrated run-time. You see many of these service components show up in the mobile platforms, but the PC ecosystem is lagging here.

Netbooks and the cloud

Dana Blankenhorn at ZDNet posted an interesting analysis of Google's Chrome OS announcement. The basic premise is that Google as a cloud information provider can subsidize a Netbook since it will get it back in cloud service revenue and a higher intangible value to its core business of collecting and characterizing customer behavior.

This is much like the telecom business or the game console business, and I have heard that same story from the reps at Samsung, Nokia, Asus, and Sony. It is just that Google has a big head start in the intangible value department.

But buried in this article is the core observation in my mind why the boxed world of software is transitioning to the cloud: security and cost.

"The problem is that Netbooks are cheap and, while they will gain in power they will stay cheap. I spent $270 on my HP Mini and that’s about right.

Microsoft has reportedly cut the price of Windows to $3 to capture Netbook OEMs, and it’s offering a cut-rate price on Office, too.

But when you consider the $50/year price to license an anti-viral, the $30/year to license a malware program and the additional $30/year you need for a registry cleaner, the software price of a Netbook gets completely out of line with its hardware cost."

This is the same observation that can be used for any boxed software. The cost of the underlying hardware platform has shrunk in the past 20 years, but the software cost hasn't kept pace. 20 years ago a workstation cost $75k so a $75k piece of software was reasonable. The cost of a workstation is now $2k, but the software is still $75k. The productivity improvement that I need to get from the software to justify the cost is too high and thus that type of cost can only be carried by a business model that has significant intangible value. And that value isn't present in the consumer and/or SMB market.

The smart phone started this trend and the netbook will accelerate it: the bulk of the market will be delivered services through subsidized hardware and software and it is the service providers that call the shots. Google, Amazon, Microsoft, Apple, Sony are already transitioning into these roles and since they have a connection with the bottom of the market pyramid, they will attract so much money that they will quickly roll over the Adobes, Oracles and SAPs of the world.

Many independent software vendors will clamor on the infrastructures of Google, Amazon, and Apple, and intangible value will be created. The enterprise market, of all markets, can't be isolated from the bulk of the money and they will need to adapt to the system where the information resides: and that will be the cloud.

Monday, June 15, 2009

Eight ways that cloud computing will change your business

Eight Ways that Cloud Computing Will Change Business is a wonderful post by Dion Hinchcliffe. The synopsis of this article is that large businesses are laggards with respect to technology adoption for the simple reason that the cost of betting on the wrong horse is too high. However, sometimes new technologies are so compelling that this wait-and-see approach is trumped. According to the article:

"Cloud Computing is quickly beginning to shape up as one of these major changes and the hundreds of thousands of business customers of cloud offerings from Amazon, Salesforce, and Google, including a growing number of Fortune 500 companies, is showing both considerable interest and momentum in the space".

The article continues to spell out eight ways cloud computing will change business.

  • Creation of a new generation of products and services
  • New lightweight form of real-time partnerships and outsourcing with IT suppliers
  • New awareness and leverage of the greater Internet and Web 2.0 in particular
  • A reconciliation of traditional SOA with cloud and other emerging IT models
  • The rise of new industry leaders and IT vendors
  • More self-service IT from the business-side
  • More tolerance for innovation and experimentation from business
  • The slow-moving, dinosaur firms will have trouble keeping up with more nimble adopters and fast-followers

  • I have always argued that cloud computing will be defined by the bottom of the economic pyramid. Smaller businesses do not have existing and legalized corporate standards of quality, accountability, and security, and they can simply piggyback on the standards provided by the data centers on which they deploy. This provides them with a first mover advantage that doesn't waste energy trying to sell cloud computing solutions inside an already stressed IT organization of a large enterprise.

    Secondly, consumers in many ways are much more adaptable than enterprises. I am using Google or Amazon or AT&T but I don't get bend out of shape if my service experiences a hick-up. Take cell phone service: if you insisted on 99.999% availability, like many enterprise customers seem to demand, you couldn't use a cell phone. However, everybody agrees that a cell phone is a net productivity improvement. It is this consumer, conditioned by an imperfect world, that is demanding new services for their iPhones, BlackBerries, and Pres and is willing to take a less stringent SLA in exchange for lower cost and convenience. And there is a legion of startups that is willing to test out that appetite.

    Brand loyalty in this connected world is non-existent for the simple reason that most services are multi-vendor anyways. You get a Nokia phone on a Verizon network connecting to a Real Rhapsody music service to satisfy your need for mobility. I switched from Yahoo search, to Google search, to Microsoft search in a matter of minutes simply because either their UI and/or their results provided a better fit for my sensibilities. I find it wonderful that after a decade of technology consolidation and stagnation we are back to a world of innovation and rapid expansion of new services. And I believe that it is the consumer that will define these services, not the enterprise.

    Friday, May 15, 2009

    Amazon EC-2 for Compute Intensive Workloads

    The cloud has evolved from the managed hosting concept. With data centers like EC-2 making it easier to provision servers on-demand, elasticity can be build into the application to scale dynamically. Microsoft Azure provides a similar, and nicely integrated, platform for the Windows application world. But how well do this clouds hold up when demand is elastic for compute intensive workloads? The short of it? Not so well.

    I found two papers that report on experiments that take Amazon EC-2 as IT fabric and deploy compute intensive workloads on them. They compare these results to the performance obtained from on-premise clusters that include best-known practices for compute intensive workloads. The first paper uses the NAS benchmarks to get a broad sampling of high-performance computing workloads on the EC-2 cloud. They use the high-performance instances of Amazon and compare them to similar processor gear in a cluster at NCSA. The IT gear details are shown in the following table:

    EC-2 High-CPU ClusterNCSA Cluster
    Compute Node7GB memory, 4 cores per socket, 2 sockets per server, 2.33GHz Xeon, 1600GB storage8GB memory, 4 cores per socket, 2 sockets per server, 2.33GHz Xeon, 73GB storage
    Network InterconnectSpecific Interconnect technology unknownInfiniband network

    The NAS Parallel Benchmarks are a widely used set of programs designed to evaluate the performance of high performance computing systems. The suite mimics critical computation and data movement patterns important for compute intensive workloads.

    Clearly, when the workload is confined to a single server the difference between the two compute environments is limited to the virtualization technology used and effective available memory. In this experiment the difference between Amazon EC-2 and a best-known practice cluster is between 10-20% in favor of a non-virtualized server.

    However, when the workload needs to reach across the network to other servers to complete its task the performance difference is striking, as is shown in the following figure.

    Figure 1: NPB-MPI runtimes on 32 cores (= 4 dual socketed servers)

    The performance difference ranges from 2x to 10x in favor of a optimized high-performance cluster. Effectively, Amazon is ten times more expensive to operate than if you had your own optimized equipment.

    The second paper talks to the cost adder of using cloud computing IT infrastructure for compute intensive workloads. In this experiment, they use a common workload to measure the performance of a supercomputer, HPL, which is an acronym for High Performance LINPACK. HPL is relatively kind to a cluster in the sense that it does not tax the interconnect bandwidth/latency much as compared to other compute intensive workloads such as optimization, information retrieval, or web indexing. The experiment measures the average floating point operations (FLOPS) obtained divided by the average compute time used. This experiments shows an exponential decrease in performance with respect to dollar cost of the clusters. This implies that if we double the cluster size the FLOPS/sec for money spent does down.

    The first paper has a wonderful graph that explains what is causing this weak scaling result.

    This figure shows the bisection bandwidth of the Amazon EC-2 cluster and that of a best-known practice HPC cluster. Bisection bandwidth is the bandwidth between two equal parts of a cluster. It is a measure how well-connected all the servers are to one another. The focus of typical clouds to provide a productive and high margin service pushes them into IT architectures that do not favor interconnect bandwidth between servers. Many clouds are commodity servers connected to a SAN and the bandwidth is allocated to that path, not to bandwidth between servers. And that is opposite to what high performance clusters for compute intensive workloads have evolved to.

    This means that for the enterprise class problems, were efficiency of IT equipment is a differentiator to solve the problems at hand, cloud IT infrastructure solutions are not well matched yet. However, for SMBs that are seeking mostly elasticity and on-demand use, cloud solutions still work since there are still monetary benefits to be extracted from deploying compute intensive workloads on Amazon or other clouds.

    Friday, February 27, 2009

    Comparing Cloud Web Services

    In my continued quest to build an operational model that properly accounts for the costs of different cloud web services, I have reached back to the visual vocabulary of operational analysis. If it was good enough to build BMC Software I figured it would be good enough for this task.

    The following figure captures the typical resources in a modern data center. In the vocabulary of operational analysis we have servers and transactions, and the diagram depicts the read and write transactions going into different services such as filers or Internet, and read responses coming out. If you would build your own data center these servers and services would reflect all your capital and operational expenditures.

    Different data centers select different resources to monetize. This makes the comparison between different providers so difficult: they are all selling something different.

    Let's start with Amazon as the baseline since AWS tries to monetize all the resources in its data center, except for the internal routers. The next diagram shows the resource costs that Amazon charges you when running an application on their data centers.

    Now compare that with a second provider, GoGrid. GoGrid does not monetize the incoming internet connection into their data center. So if you have a workload that reads a lot of data from the internet, GoGrid is fantastic. Also, GoGrid does not use a filer in their architecture, instead giving the server its own local disk instance that is managed and maintained. This works very well for web applications but does not work well for running a distributed file system instance. So running Hadoop on GoGrid is not attractive. The following diagram depicts GoGrid's monetization strategy.

    When you compare both diagrams it is clear that GoGrid is the better solution for running a web application server. On top of that, GoGrid offers free load balancers, which you would need to pay for separately on Amazon.

    This visual vocabulary presented here makes it very easy to identify what types of workloads would fit on different cloud providers. It also shows you the high-cost items in the overall IT infrastructure you need to outsource your application.

    To make the accounting complete, we also need a model of our workload that quantifies the storage, compute, and I/O requirements. For web application services the world of cloud solutions is well represented, but for utility computing this is not the case. The cost of filer and storage are significant and quickly become the overriding cost components for a workload. Furthermore, given the fact that storage costs accumulate even when you are not computing makes the on-demand argument less genuine. Finally, the use of cpu instance hours is not good enough for utility computing. Using the electric grid as comparison, I am consuming electrons, and pay accordingly. In proper utility computing I am consuming instructions and I/Os. These metrics are independent of the speed of the processors or filer on which I run and thus I do not need to guess what type of cpu-instance-hours I would consume. By providing instruction and I/O consumables providers can differentiate on the basis of capacity or latency in the same way that electricity providers do. Without that compensation model, utility computing is a ways off IMHO.

    Sunday, January 4, 2009

    Open Source and free data

    Two articles that are just wonderfully expansive...

    I came across these articles researching and thinking about SaaS and PaaS and what would be the best road forward for startups in that space. Salesforce may have blazed the trail but SugarCRM is doing most of what I am doing with Salesforce. Hosting SugarCRM on demand on Amazon would save me money over Salesforce. However, in the end it is not the SaaS CRM system that is the value, it is the data inside it and my internal business process surrounding that CRM data. I want the flexibility to take this data and process anywhere so that I can take advantage of available skill or innovation and extract more value out of the accumulated data.

    Cloud computing exposes this fundamental problem of data movement. This problem was not perceived as a problem as much for on-premise applications due to the false impression that local data is always usable. To make cloud services ubiquitous this problem of data movement needs to get solved and robust, free Open Source components will be developed to solve this problem since users will demand it.