我刚开始学习Hadoop。在一本书中有一个我不完全理解的例子。
实施例
Consider processing 200 GB of data with 50 nodes, in which each node processes 4 GB
of data located on a local disk. Each node takes 80 seconds to read the data (at the
rate of 50 MB per second). No matter how fast we compute, we cannot finish in under 80
seconds. Assume that the result of the process is a total dataset of size 200 MB, and
each node generates 4 MB of this result. which is transferred over a 1 Gbps (1 MB per
packet) network to a single node for display. It will take about 3 milliseconds (each
1 MB requires 250 microseconds to transfer over the network, and the network latency
per packet is assumed to be 500 microseconds (based on the previously referenced talk
by Dr. Jeff Dean) to transfer the data to the destination node. Ignoring computational
costs, the total processing time cannot be under 40.003 seconds.
在上面的例子中,我无法弄清楚如何导出传输时间3毫秒和总处理时间40.003秒。