[colug-432] Code check
Tom Hanlon
tom at functionalmedia.com
Sat Aug 3 11:21:14 EDT 2013
Thanks Eric, and Jim,
I will implement one of the above solutions. And I expect success will soon
follow.
Overview...
Just trying to take a mbox folder from a mail archive and write into an
avro file, a binary delimited hadoop friendly container with a schema.
Like any programming project I made great initial progress..
Wrote an avro file in python.. that went pretty fast.
Parse a mbox .. that went pretty fast.
Extract the to, from, subject and date.. that went pretty fast.
Pull out the message itself, that is where I hit some bumps..
I thought
import mailbox
would provide a simple tostring method, but as far as I can tell it does
not. And due to MIME I see why, so the simple stuff like headers are
straightforward, MIME messages make the payload extraction a challenge.
Thanks everyone.
--
Tom
On Sat, Aug 3, 2013 at 10:27 AM, Eric Floehr <eric at intellovations.com>wrote:
>
> import sys
>> sys.stdout.writeline(payload[:200] + '\n')
>>
>
> Typo: 'writeline' should just be 'write'
>
>
>
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.colug.net/pipermail/colug-432/attachments/20130803/061e2fb8/attachment.html
More information about the colug-432
mailing list