The fundamental properties about the data that you need to answer are:
1- security and privacy
Security and Privacy
This should be the starting point since it affects your liability. The current innovators of cloud computing (financial institutions, Google, Amazon) are global organizations with geographically dispersed operations. The business operation of one time zone should be visible to other time zones so these organizations had to solve security and compliance to local privacy laws. Clearly, this has come at a significant cost. However, nascent market for cloud computing resources in the form of Amazon Web Services make it possible for start-ups to play in this new market. These start-ups clearly play a different game and their services tend to have very low security or privacy needs, which allows them to harbor a very disruptive technology. These start-ups will develop low-cost services that will provide powerful competition to EDS and other high-security, high-privacy outsourcers. They will not compete with them directly, and they will expand the market with a lower cost alternative: two prime ingredients for disruptive technology.
Data size is the next most important attribute. If your data is large, say a historical snapshot of the World Wide Web itself, you need to store and maintain Petabytes of data. This clearly is a different requirement than if you just want to provide access to a million row OLAP database. Size affects economics and algorithms and it also can complicate the next attribute, location.
The location of the data will affect what you can do with it. If the data size is very large, the time or economics of uploading/downloading the data set to a commercial cloud resource provider may be prohibitive. In case of the historical web snapshots, it is much better to generate the data in the cloud itself: that is, the data is created by the compute function you execute in the cloud. For the web index, this would be the set of crawlers that collect the web snapshot. There are readily available AMIs for Hadoop/Lucene/Nutch that enable a modest web indexing service using AWS.
The data format affects the details how to use the data. For example, if you have your data in an OLAP database you will need to have that OLAP database running in your process. Similarly, if you have complex data such as product geometry data on which you want to compute stress or vibrational analysis, you will need access to the geometry kernel used to describe the data. Finally, the data format affects the efficiency with which you can access and compute on your data. This is frequently an underestimated aspect of cloud computing but it can have significant economic impact if you pay as you go for storage and computes.