[colug-432] Hadoop follow-up questions
Tom Hanlon
tom at functionalmedia.com
Fri Mar 25 19:13:51 EDT 2011
Scott,
On Mar 24, 2011, at 9:40 AM, Scott Merrill wrote:
> Last night's COLUG was an interesting introduction to Hadoop. Tom
> Hanlon did a great job cutting through the hype to present the
> strengths and weaknesses of Hadoop. Thanks, Tom!
>
> I have a couple of questions this morning. Anyone should feel free to
> answer, not just Tom, if you have any insight.
>
> Since Hadoop is built atop HDFS, is it possible to utilize the
> underlying HDFS independently of Hadoop's job scheduling functions
> while still using Hadoop's job stuff for other things? For example,
> would it be possible / advisable to stick a whole bunch of binary data
> into the HDFS and access that data with HDFS's GET and PUT primitives,
> and simultaneously put other data into HDFS to be processed using
> MapReduce functions?
>
I do not have any benchmarks for you on performance of writes. Concurrent writes to a multi node cluster would be able to take advantage of the IO capacity of multiple machines.
I failed to mention that we recommend Gigabit ethernet, and if you are using it for Map Reduce jobs with a fair amount of intermediate data, then the shuffle and the sort can be the bottleneck so 10G Top or Rack switch.
> If the above is possible, can anyone speak to the general performance
> of the HDFS GET and PUT operations? I understand that MapReduce is a
> batch process, and spinning up the JVM will slow things down. But for
> just accessing a file stored in HDFS with a GET command, what kind of
> performance can one expect for that?
I guess I could fire up a machine in standalone mode and compare to straight to disk. If I get a chance to do that this weekend I will get back to you.
In front of me right now, I only have Virtual Machines, and we do not want to add another layer to that test.
--
Tom
>
> Thanks again, Tom, for the presentation!
>
> Cheers,
> Scott
>
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432
Tom Hanlon
tom at functionalmedia.com
Cloudera Certified Hadoop Developer
Certified MySQL DBA
More information about the colug-432
mailing list