Download as:
Rating : ⭐⭐⭐⭐⭐
Price: $10.99
Language:EN
Pages: 1
Words: 2926

Html web access costs microdollars and network cost

MSR-TR-2003-24

Microsoft Research

Distributed Computing Economics

Computing economics are changing. Today there is rough price parity between (1) one database access, (2) ten bytes of network traffic, (3) 100,000 instructions, (4) 10 bytes of disk storage, and (5) a megabyte of disk bandwidth. This has implications for how one structures Internet-scale distributed computing: one puts computing as close to the data as possible in order to avoid expensive network traffic.

The Cost of Computing

Megaservices like Yahoo!, Google, and Hotmail have relatively low operations staff costs. These megaservices have discovered ways to deliver content for less that the milli-dollar that advertising will fund. For example, in 2002 Google had an operations staff of 25 who managed its two petabyte (215 bytes) database and 10,000 servers spread across several sites. Hotmail and Yahoo! cite similar numbers – small staffs manage ~300 TB of storage and more than ten thousand servers.

Most applications do not benefit from megaservice economies of scale. Other companies report that they need an administrator per terabyte, an administrator per 100 servers, and an administrator per gigabit of network bandwidth. That would imply an operations staff of more than two thousand people to operate Google – nearly ten times the size of the company.

Web Services

Microsoft and IBM tout web services as a new computing model – Internet-scale distributed computing. They observe that the http Internet is designed for people interacting with computers. Traffic on the future Internet will be dominated by computer-to-computer interactions. Building Internet-scale distributed computations requires many things, but at its core it requires a common object model augmented with a naming and security model. Other services can be layered atop these core services. Web services are the evolution of the rpc, dce, dcom, corba, rmi, … standards of the 1990’s. The main innovation is an xml base that facilitates interoperability among implementations.

Neither grid computing nor web services have an outsourcing or advertising business model. Both are plumbing that enable companies to build applications. Both are designed for computer-to-computer interactions and so have no advertising model – because there are no eyeballs involved in the interactions. It is up the companies to invent business models that can leverage the web services plumbing.

Application Economics

A computation task has four characteristic demands:

  • Networking delivering questions and answers,

To make the economics tangible, take the following baseline hardware parameters1:

From this we conclude that one dollar equates to

= 1 $

≈ 10 M database accesses

≈ 10 TB of disk bandwidth

Data loading and data scanning are cpu-intensive; but they are also data intensive, and therefore not economically viable as mobile applications. Some applications related to database systems are quite cpu intensive: for example data loading takes about 1,000 instructions per byte. The “vision” component of the Sloan Digital Sky Survey that detects stars and galaxies and builds the astronomy catalogs from the pixels is about 10,000 instructions per byte. So, they are break-even candidates: 10,000 instructions per byte is the break-even point according to the economic model above (10 Tops of computing and 1 GB of networking both cost a dollar). It seems the computation should be at least 30,000 instructions per byte (a 3:1 cost benefit ratio) before the outsourcing model becomes really attractive. The break-even point is 10,000 instructions per byte of network traffic or about a minute of computation per MB of network traffic.

Few computations exceed that threshold; most are better matched to a Beowulf cluster. Computational Fluid Dynamics (CFD) is very cpu intensive, but again, CFD generates a continuous and voluminous output stream. To give an example of an adaptive mesh simulation, the Cornell Theory Center has a Beowulf-class MPI job that simulates crack propagation in a mechanical object [4]. It has about 100MB of input, 10GB of output, and runs for more than 7 cpu-years. The computation operates at over one million instructions per byte, and so good is a candidate for export to the WAN computational grid. But, the computation’s bisection bandwidth requires that it be executed in a tightly connected cluster. These applications require inexpensive bandwidth available to a Beowulf cluster [5]. In a Beowulf cluster networking is ten thousand times less expensive than WAN networking – which makes it seem nearly free by comparison.

Conclusions

Caveats

Beowulf clusters have completely different networking economics. Render farms, materials simulation, and CFD fit beautifully on Beowulf clusters because there the cost of networking is very inexpensive: a GBps Ethernet fabric costs about 200$/port and delivers 50MBps, so Beowulf networking costs are comparable to disk bandwidth costs – 10,000 times less than the price of Internet transports. That is why rendering farms and BLAST search engines are routinely built using Beowulf clusters. Beowulf clusters should not be confused with Internet-scale Grid computations.

If telecom prices drop faster than Moore’s law, the analysis fails. If telecom prices drop slower than Moore’s law, the analysis becomes stronger. Most of the argument in this paper pivots on the relatively high price of telecommunications. Over the last 40 years telecom prices have fallen much more slowly than any other information technology. If this situation changed, it could completely alter the arguments here. But there is no obvious sign of that occurring.

Acknowledgements

References

[4] Gerd Heber, Cornel Theory Center, Private communication, 12 Jan 2003.

[5] How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters, Thomas Sterling, John Salmon, Donald J. Becker and Daniel F. Savarese, MIT Press, Cambridge, 1998 ISBN 0-262-69218


  1. The hardware prices are typical of web prices, the wan price is typical of rates paid by large (many Gbps/month) Internet service providers. Hardware is depreciated over 3 years.↩︎

Copyright © 2009-2023 UrgentHomework.com, All right reserved.