[colug-432] Hadoop follow-up questions

Angelo McComis angelo at mccomis.com
Thu Mar 24 10:02:26 EDT 2011


Scott,

Remember, if you corrupt data, you now have three copies of corrupt data, if
you're relying on block-level replication. Same goes for a delete operation.


Having an abstracted copy of the data (crc verified file based, committed
transaction log journal, etc, take your pick)  that you can recover from is
always smart.

/wasn't able to make the preso, but been down this path before...

-Angelo



On Thu, Mar 24, 2011 at 9:58 AM, Scott Merrill <skippy at skippy.net> wrote:

> On Thu, Mar 24, 2011 at 9:40 AM, Scott Merrill <skippy at skippy.net> wrote:
> > I have a couple of questions this morning. Anyone should feel free to
> > answer, not just Tom, if you have any insight.
>
> One more question: since HDFS redundantly stores data blocks in
> triplicate, does it make sense to still use traditional backup methods
> on data stored in HDFS? If one puts data into HDFS, can one reasonably
> rely on the built-in fault-tolerance of the triplicate copies of that
> data, or should one still be putting data to tape?
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.colug.net/pipermail/colug-432/attachments/20110324/b4f15e58/attachment.html 


More information about the colug-432 mailing list