Sunday, July 5, 2015

Machine Learning... Taking first steps

This is going to be my first post (of many, I hope!!!) where I would discuss my recent projects dealing with machine learning (ML) and what I learned from them. I hope to benefit people who are trying to understand topics of ML and to make notes for my own future reference.

First of all, why do we need ML in the first place? Well, ML we can be broken into two categories "Supervised Learning" (SL) and "Unsupervised Learning" (UL).

SL in my opinion is more widely used and presents more practical applications versus UL. Let's consider few examples, SL can be further broken down to "Classifiers" and "Regression". As the name may suggest, classifiers try to predict type or class of the outcome based on historical data. For example, given number of emails some of which were classified as spam and some of which were flagged to be not spam, we can build a classifier that will try to automatically filter incoming mail as spam or not spam for us. You can see this in your Outlook or Gmail! (Ummm, I feel that I need to cover spam filtering in bit more details in the future. Stay tuned.)

Classifiers try to predict more or less yes/no answer. Is it a spam or not? Is it a car or not? Is it fraudulent transaction or not? Regression on other hand tries to predict certain value based on known parameters. One example is house cost prediction: given parameters of the house such as sq ft, number of bedrooms and bathrooms, etc, time of year and its location, one can try to predict its cost at a certain time in the future.

UL - since outputs are unknown you can not really train your algorithms to predict since you don't know what it is. One of the classical uses of UL is "Clustering". For example, taking few specific properties of a subject plot them on a "map" to see if clusters would be formed, which would help you to determine relationship between your input data. For example, if you have hundreds of articles and what to know which ones are related to each other, you can read all of them and to make your decision OR you can use ML! In the nutshell, you would feed your articles where they would be broken down to individual words and indexed (see Lucene for more information). Then cosine similarity (more on in one of my future posts!!!) will be used to determine how "close" one document relates to each other. Based on that clusters would be formed where related articles will be grouped together and unrelated will be separated... This way you could easily tell if two articles are related or totally different.

Other examples of ML include:
- Targeted retention - It would cost companies a lot of money to try to offer all of their customers incentive to stay. Instead companies try to determine which of their customers are most likely to leave and offer them a great deal in order to retain them.
- Recommendation Engines - Ever bought anything on Amazon? Then I am sure you saw a list of suggested recommendations :)
- Sentiment Analysis - Is it good or bad? A quick example, using Natural Language Processing (NLP) we can extract and process text to determine if it contains a lot of "positive" words such as 'good', 'awesome', etc

and that's us just scratching a surface of endless possibilities of ML!

No comments:

Post a Comment