[colug-432] Code check

Tom Hanlon tom at functionalmedia.com
Thu Aug 1 13:37:55 EDT 2013


Colug,


Writing some python to do the following

Take this list, or any similar list of variable length

['DVD', 'dishwasher', 'software', 'tv']

And return all two item combinations.

DVD *** dishwasher
DVD *** software
DVD *** tv
dishwasher *** software
dishwasher *** tv
software *** tv

My first draft looks like this.

#!/usr/bin/python
list1 = ['software', 'DVD', 'dishwasher', 'tv'];
list1.sort();
print list1;
for i in range(len(list1)):
    for i2 in range(i+1,len(list1)):
        print list1[i], "***",list1[i2];


It works.

Can this be done in a cleaner way ?
Am I invoking any bad habits ?
Is there a builtin tool for this ?

Reasons I am doing this..
hive the hadoop SQL to Map Reduce translator has ngrams, that allow
splitting strings into configurable length substrings. That is sorta cool,
but I want all possible pairs, not just consecutive pairs.

The list of items is an array to hive, I could pass an array of arrays if
there was a complete "shuffle" I could do on the array (word list) . Not
finding this I figure that shipping the itemlist to python would be the
most efficient.

So that is the use case, but help with the python in terms of a quick
sanity check is what I need. Unless you happen to be doing word
co-occurrence in hive and have some insight there.

Thanks, Colug,

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.colug.net/pipermail/colug-432/attachments/20130801/55bf76a1/attachment.html 


More information about the colug-432 mailing list