The term commodity hardware is cited. However, commodity is referred to as:
(This doesn’t sound much like the romantic notion of dirt cheap infrastructure to me!)
SATA Data Transfer Rate
Version Gbits/sec MBytes/sec Year
1.0 (I) 1.5 150 2001
2.0 (II, 3G) 3.0 300 2004
3.0 (III, 6G) 6.0 600 2009
3.2 (Express) 16.0 1969 2013
More on RAID here
Using RAID on the DataNode FS used to store HFDS content is a bad idea because HDFS already has replication and error-checking bullt in. RAID is strongly recommended on the NameNode for additional security (HDFS uses disks to durably store metadata about the FS).
Topology: All of the master and slave nodes must be able to open connections to each other. Client nodes need to be able to talk to all of the master and slave nodes.
Big data brings with it two fundamental challenges: how to store and work with voluminous data sizes, and more important, how to understand data and turn it into a competitive advantage.
Hadoop is a distributed filesystem, and it offers a way to parallellize and execute progams on a cluster of machines.
Figure 1.3 – Topography
The HDFS namenode keeps in memory the metadata about the filesystem such as which datanodes manage the blocks for each file.
HDFS clients talk to the namenode for metadata-related activities and DataNodes for reading and writing files.
DataNodes communicate with each other for pipelining file reads and writes.
Files are mede up of blocks, and each file can be replicated multiple times, meaning there are many identical copies of each block for the file (default = 3).
Reading Hadoop in Practice
“It was a revelation to observe our MapReduce jobs crunching through our data in minutes. Of course, what we weren’t expecting was the amount of time that we would spend debugging and performance-tuning our MR jobs.
Not to mention the new roles we took on as production administrators-the biggest surprise in this role was the number of disk failures we encountered during those first few months supporting production .
The greatest challenge we faced when working with Hadoop, and specifically MR, was learning how to (think) solve problems with it.
After one is used to thinking in MR, the next challenge is typically related to the logistics of working with Hadoop, such as how to move data in & out of HDFS.”
Tony Rogerson Deck
Great deck on ‘old’ and ‘new’ database worlds