Tuesday, July 18, 2017

Fun with Python's collections library

I would like to provide few quick examples of how to use collection library in python to simplify some common tasks.  

Let's say that you have collection of some kind and you need to get basic counts to see what values you have. One way to do it is to use basic map, if you have the value increment it, if not, insert it and initialize the counter to one.

counts = {}
for item in my_collection:
if item in counts:
counts[item] += 1
else:
counts[item] = 1


This can be simplified by using collections library like so

from collections import defaultdict

counts = defaultdict(int) # values will initialize to zero
for item in my_collection:
counts[item] += 1

Now that you have the counts, how about finding top 5?

By using simple list comprehension
count_value_pair = [(count, value) for value, count in counts.items()]
count_value_pair.sort()
print count_value_pair[-5:]

or by using collections Counter library

from collections import Counter

counts = Counter(collection)
counts.most_common(5)

Obviously Counter can be used in the first example above to provide the right numbers
counts.viewkeys() # view all keys
counts.viewvalues() # view all values
counts.viewitems() # view all keys and their values




Of course, you can unlock even more power by using pandas' DataFrames

from pandas import DataFrame

df = DataFrame(collection)

counts = df.value_counts()

here you can also fill in missing values and blanks

df = df.fillna('Missing')
df[df == ''] = 'Unknown'

No comments:

Post a Comment