Let me explain... So for instance you are involved in logistical regression calculation and your output is set of categories like 1, 2, 3, 4... if you use logistical regression to predict the value, it can come up with 1.4... which is really not a valid category.
In this case, you need to break down output column into several with yes/no values, this way, logistical regression would work by predicting either yes or no.
So how do you do it?
Pandas!
First import pandas
>>> import pandas as pd
Then let's say you have following data.
>>> my_data = {'name': ['John Doe', 'Jane Doe', 'Mike Roth', 'Mark Wagner', 'David Scott'],
... 'title': ['developer', 'manager', 'developer', 'manager', 'developer']}
View it again...
>>> my_data
{'name': ['John Doe', 'Jane Doe', 'Mike Roth', 'Mark Wagner', 'David Scott'], 'title': ['developer', 'manager', 'developer', 'manager', 'developer']}
Convert it to dataframe object
>>> df = pd.DataFrame(my_data, columns=['name','title'])
>>> df.head()
name title
0 John Doe developer
1 Jane Doe manager
2 Mike Roth developer
3 Mark Wagner manager
4 David Scott developer
use get_dummies panda method to break it out accordingly
>>> df_title = pd.get_dummies(df['title'])
merge it to the original dataframe
>>> df = pd.concat([df,df_title],axis=1)
Tada... that's our result
>>> df.head()
name title developer manager
0 John Doe developer 1.0 0.0
1 Jane Doe manager 0.0 1.0
2 Mike Roth developer 1.0 0.0
3 Mark Wagner manager 0.0 1.0
4 David Scott developer 1.0 0.0
No comments:
Post a Comment