Sunday, July 11, 2010

Complexity and fault-tolerance

As an engineer I frequently look towards biology to get inspired or get ideas how complex systems need to be put together to stand the test of time. In this quest, I came across a wonderful article from a Yale research team that compared the transcriptional regulatory network of a bacterium to the call graph of the Linux operating system.

“It is a commonplace metaphor that the genome is the operating system of a living organism. We wanted to see if the analogy actually holds up,” said Mark Gerstein, the Albert L. Williams Professor of Biomedical Informatics; professor of molecular biophysics and biochemistry, and computer science; and senior author of the paper.

Both E coli and the Linux networks are arranged in hierarchies, but with some notable differences in how they achieve operational efficiencies. The molecular networks in the bacteria are arranged in a pyramid, with a limited number of master regulatory genes at the top that control a broad base of specialized functions, which act independently.

In contrast, the Linux operating system is organized more like an inverted pyramid, with many different top-level routines controlling few generic functions at the bottom of the network. Gerstein said that this organization arises because software engineers tend to save money and time by building upon existing routines rather than starting systems from scratch.

“But it also means the operating system is more vulnerable to breakdowns because even simple updates to a generic routine can be very disruptive,” Gerstein said. To compensate, these generic components have to be continually fine-tuned by designers.

Operating systems are like urban streets – engineers tend to focus on areas that get a lot of traffic,” said Gerstein. “We can do this because we are designing these changes intelligently.”

As an engineer this is a very recognizable failure mode of functional design by humans. We are so focused on providing the functionality at the lowest cost possible that the system wide aspects of how the functionality should be reliably provided is lost. A biological system like the human body could lose an eye, a digit, or even a limb and we would still be able to function as a human being. But if you take one leg of a table or chair, the table or chair ceases to function as designed. It is only in high exposure or non-serviceable designs that this broader context is designed into the functionality of the system. Control systems of nuclear plants, or trading algorithms in finance are examples, as are the operating systems of deep-space vehicles.

Cloud computing today needs some innovations to address the broader system exposure of a business process executing in a remote location. For example, security and authenticity of data or a software asset are the two elements that become more nebulous in a cloud context. But solutions to these problems exist and cloud computing will adopt these as best practices. Cloud computing will become a better implementation of what private companies do with their internal IT today. Particularly at the SMB level, cloud computing is already much stronger than most internal IT processes, with well defined disaster recovery processes and geographical redundancy: two elements that are beyond the capital reach of most SMBs. Cloud computing is shaping up to be a new organization of IT capability that will enable the next generation of business process innovation.

No comments: