[colug-432] Hadoop follow-up questions

Tom Hanlon tom at functionalmedia.com
Fri Mar 25 19:13:51 EDT 2011


Scott, 


On Mar 24, 2011, at 9:40 AM, Scott Merrill wrote:

> Last night's COLUG was an interesting introduction to Hadoop. Tom
> Hanlon did a great job cutting through the hype to present the
> strengths and weaknesses of Hadoop. Thanks, Tom!
> 
> I have a couple of questions this morning. Anyone should feel free to
> answer, not just Tom, if you have any insight.
> 
> Since Hadoop is built atop HDFS, is it possible to utilize the
> underlying HDFS independently of Hadoop's job scheduling functions
> while still using Hadoop's job stuff for other things? For example,
> would it be possible / advisable to stick a whole bunch of binary data
> into the HDFS and access that data with HDFS's GET and PUT primitives,
> and simultaneously put other data into HDFS to be processed using
> MapReduce functions?
> 

I do not have any benchmarks for you on performance of writes. Concurrent writes to a multi node cluster would be able to take advantage of the IO capacity of multiple machines. 

I failed to mention that we recommend Gigabit ethernet, and if you are using it for Map Reduce jobs with a fair amount of intermediate data, then the shuffle and the sort can be the bottleneck so 10G Top or Rack switch. 




> If the above is possible, can anyone speak to the general performance
> of the HDFS GET and PUT operations? I understand that MapReduce is a
> batch process, and spinning up the JVM will slow things down. But for
> just accessing a file stored in HDFS with a GET command, what kind of
> performance can one expect for that?

I guess I could fire up a machine in standalone mode and compare to straight to disk. If I get a chance to do that this weekend I will get back to you. 

In front of me right now, I only have Virtual Machines, and we do not want to add another layer to that test. 


--
Tom 

> 
> Thanks again, Tom, for the presentation!
> 
> Cheers,
> Scott
> 
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432

Tom Hanlon
tom at functionalmedia.com
Cloudera Certified Hadoop Developer
Certified MySQL DBA




More information about the colug-432 mailing list