Sunday, December 7, 2008

Comparing the Cost Continued...

The next step was to select our benchmarks and calculate their costs. We extracted two workloads that are common to many product development companies: a regression workload that arises when a team collaborates on the same development task, and a technical workload when an individual is using computer models to generate new insight/knowledge.

The regression workload can be generated by a software design team developing a new application, a financial engineering team back testing new trading strategies, or a mechanical design team designing a new combustion engine that runs on alternative fuels.

The technical workload can be a new rendering algorithm to model fur on an animated character, or a new economic model that drives critical risk parameters in a trading strategy, or an acoustic characterization of a automobile cabin.

The first workload is characterized by a collection of tests that are run to guarantee correctness of the product during development. Our test case for a typical regression run is a 1000 tests that run at an average of 15 minutes each. Each developer typically runs two such regressions per day, and for a 50 person design team this yields 100 regression runs per day. The total workload equates to roughly 1050 cpu hours per hour and would keep a 1000 processor cluster 100% occupied.

The second workload shifts the focus from capacity to capability. The computational task is a single simulation that requires 5 cpu hours to complete. The benchmark workload is the work created by a ten person research team that runs five simulations per day. Many of these algorithms can actually run in parallel and such a task could run in 30 minutes when executed in parallel on ten processors. Latency to solution is a major driver on R&D team productivity and this workload would have priority over the regression workload particularly during the work day. The total workload equates to roughly 31 cpu hours per hour because this workload runs just in the eight hour work day.

Running these two workloads on our cloud computing providers we get the following costs per day:
Regression Workload$25,075.17$18,250.25
Knowledge Discovery$265.09$230.13

The total cost of $20-25k per day makes the regression workload too expensive for outsourcing to today's cloud providers. A 1000 processor on-premise x86 cluster costs roughly $10k/day including overhead and amortization. The cost of bulk computes like the regression workload needs to go down by at least a factor of 5x before cloud computing can bring in small and medium-sized enterprises. However, the technical workload at $250/day is very attractive to move to the cloud since this workload is periodical with respect to the development cycle and it moves CapEx to OpEx to frees up capital for other purposes.

The big cost difference between Rackspace/Mosso and Amazon is the Disk I/O charge. It doesn't appear that Rackspace monetizes this cost. From the cost models, this appears to be a liability for them since the Disk I/O cost (moving the VM image and data sets to and from disk) represents roughly 20% of the total costs. Fast storage is notoriously expensive so this appears to be a weakness of Rackspace.

In a future article we will dissect these costs further.

No comments: