[colug-432] Hadoop interest ?

Tyler Wymer twymer at gmail.com
Wed Dec 15 11:52:04 EST 2010


Tom,

There's both a Python and Ruby (as well as a handful of other
languages) groups in Columbus. I don't know what the Ruby group did in
terms of Hadoop, didn't see anything on their past meetings about it.

Columbus Ruby Brigade: http://columbusrb.com/
Central Ohio Python: http://www.cohpy.org/

A good place to look for similar groups in your language of choice:
http://www.meetup.com/techlifecolumbus/

Tyler

On Wed, Dec 15, 2010 at 11:40 AM, Tom Hanlon <tom at functionalmedia.com> wrote:
> Hi Folks..
>
> I will try to be more concise and not have so many short responses..
>
> Anyhow.. The Ruby group in Columbus ? I did not know there was one ? What sort of hadoopy stuff has been done there ?
>
> The Py-Ohio, I knew about that.. is there a group ?
>
>
> On Dec 15, 2010, at 10:38 AM, Scott McCarty wrote:
>
>> I feel exactly the same, I "get it" but we (my company) are having trouble finding a use for it without completely redesigning the way we build and do things. The "cloud" solutions we have been investigating are "libcloud" (python/java) and Overmind.
>>
>
> I do not really feel that Hadoop fits all that well into the typical "cloud" scenario. Meaning, hardware/software/platforms purchased on an as-needed basis.
>
> It is "cloud" in terms of many machines operating in parallel to solve a computing problem, but so much data that you are optimizing all of your operations to basically get more spindles reading at Sequential Read speed, is not something that you do remotely for long. If disk reads are the bottleneck, then network transfer rate to remote that data is as big of a bottleneck and/or cost hurdle.
>
> So.. maybe some private cloud thing, like Eucalyptus and a rack of machines, or whatever rackspace is doing with open stack, but I still feel like hadoop sits outside that model.
>
> Maybe it is a good topic for a meeting.
>
>
>> Libcloud wraps about 20 providers (amazon, rackspace, linode) and Overmind controls this all through a web interface. The problem is you have to start thinking ephemerally with load balancers and get all of your data back (eg. Casandra cluster or MySQL replication). We are a hosting provider so it is hard to figure out how deploy in this manner unless you Amazon/Google.
>>
>> My two cents
>> Scott M
>
> Agreed, add my 2 cents to your 2 cents..
>
>>
>> On Wed, Dec 15, 2010 at 10:20 AM, Angelo McComis <angelo at mccomis.com> wrote:
>> Tom / all:
>>
>> I was speaking to a friend of mine who works at Google, and he was intimating how wonderfully awesome the Map Reduce / Hadoop stuff is. His example was a computational job that would not be able to complete in his lifetime on a single server can be distributed out to multiple nodes and crunched and completed in minutes or hours, depending on how much capacity I had to throw at the work.
>>
>> To consider the direction of the IT industry as a whole, this is certainly an interesting discussion to have -
>>
>> - Companies are trying to do more cloud-like things, and a Hadoop elastic cloud makes a lot of sense there, but getting that much data from onsite to the cloud is a challenge. But, if the data set is that big, would you not spend more $ on bandwidth transfer putting it to and getting from the cloud than the GDP of some smaller countries?
>>
>
> If your dataset is large.. then "once in the cloud, always in the cloud"
>
> Meaning that Big Data is not portable, if you start with something like Elastic Map Reduce from amazon, then you have serious data transfer costs etc, unless you are aggregating from remote servers then you can send that data wherever, since you will be sending it regardless.
>
>> - Doing an internal Hadoop architecture - certainly the way to go, but what is the value of redesigning your data and processes to take advantage of Hadoop when the investment has already been made in the  larger, vertically scaling hardware?
>>
>
> Suppose you outgrow that big vertically scaling hardware?
> Big investment, and then bigger investment. In my years of messing around with computers I always wondered.."why can't I glue these 4 386's together and make a pentium"
>
> So you outgrow your mega-machine database... how big of a machine do you need for next year ? What do you do with last years machine ?
>
> Hadoop scales incrementally. Meaning going from 10 to 15 nodes gives about 50 percent increase in performance. So more flexible. It often gets placed between the data source and the data warehouse. Used to feed and crunch before inserting, and available for ad-hoc queries while in hadoop.
>
>> - Doing an internal Hadoop architecture that's based on an internal elastic cloud (e.g. use capacity when needed, give it back when finished) makes sense, but to the point of making the investment of taking existing data and processes and converting it to the style needed to be able to distribute the rows out to Hadoop, then it becomes problematic.
>>
>
> Hadoop is amazingly flexible in terms of how it ingests data.
>
>> I guess in short, I get it, but I don't see where it makes sense yet, unless you are Google, Amazon, or one of the other "Top 10" biggies out there.
>>
>
> I did not really get it either, since I tend to think in terms of web applications. But a lot of big companies and government agencies have a lot of data that they find they are unable to process, hadoop is tested and works at a scale that the current tools can not match.
>
>> Maybe this is where more public forum and discussion comes into play.
>>
>> Interested in others' comments on what cloud is, and where it makes sense.
>>
>>
>> Angelo
>> _______________________________________________
>> colug-432 mailing list
>> colug-432 at colug.net
>> http://lists.colug.net/mailman/listinfo/colug-432
>>
>>
>> _______________________________________________
>> colug-432 mailing list
>> colug-432 at colug.net
>> http://lists.colug.net/mailman/listinfo/colug-432
>
> Tom Hanlon
> tom at functionalmedia.com
> Cloudera Certified Hadoop Developer
> Certified MySQL DBA
>
>
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432
>



More information about the colug-432 mailing list