[colug-432] Code check

Tom Hanlon tom at functionalmedia.com
Sat Aug 3 11:21:14 EDT 2013


Thanks Eric, and Jim,

I will implement one of the above solutions. And I expect success will soon
follow.

Overview...
Just trying to take a mbox folder from a mail archive and write into an
avro file, a binary delimited hadoop friendly container with a schema.

Like any programming project I made great initial progress..
Wrote an avro file in python.. that went pretty fast.
Parse a mbox .. that went pretty fast.
Extract the to, from, subject and date.. that went pretty fast.

Pull out the message itself, that is where I hit some bumps..

I thought
import mailbox
would provide a simple tostring method, but as far as I can tell it does
not. And due to MIME I see why,  so the simple stuff like headers are
straightforward, MIME messages make the payload extraction a challenge.


Thanks everyone.

--
Tom






On Sat, Aug 3, 2013 at 10:27 AM, Eric Floehr <eric at intellovations.com>wrote:

>
> import sys
>> sys.stdout.writeline(payload[:200] + '\n')
>>
>
> Typo: 'writeline' should just be 'write'
>
>
>
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.colug.net/pipermail/colug-432/attachments/20130803/061e2fb8/attachment.html 


More information about the colug-432 mailing list