This is going to be my first post (of many, I hope!!!) where I would
discuss my recent projects dealing with machine learning (ML) and what I
learned from them. I hope to benefit people who are trying to
understand topics of ML and to make notes for my own future reference.
First
of all, why do we need ML in the first place? Well, ML we can be broken
into two categories "Supervised Learning" (SL) and "Unsupervised
Learning" (UL).
SL in my opinion is more widely used
and presents more practical applications versus UL. Let's consider few
examples, SL can be further broken down to "Classifiers" and
"Regression". As the name may suggest, classifiers try to predict type
or class of the outcome based on historical data. For example, given
number of emails some of which were classified as spam and some of which
were flagged to be not spam, we can build a classifier that will try to
automatically filter incoming mail as spam or not spam for us. You can
see this in your Outlook or Gmail! (Ummm, I feel that I need to cover
spam filtering in bit more details in the future. Stay tuned.)
Classifiers
try to predict more or less yes/no answer. Is it a spam or not? Is it a
car or not? Is it fraudulent transaction or not? Regression on other
hand tries to predict certain value based on known parameters. One
example is house cost prediction: given parameters of the house such as
sq ft, number of bedrooms and bathrooms, etc, time of year and its
location, one can try to predict its cost at a certain time in the
future.
UL - since outputs are unknown you can not
really train your algorithms to predict since you don't know what it is.
One of the classical uses of UL is "Clustering". For example, taking
few specific properties of a subject plot them on a "map" to see if
clusters would be formed, which would help you to determine relationship
between your input data. For example, if you have hundreds of articles
and what to know which ones are related to each other, you can read all
of them and to make your decision OR you can use ML! In the nutshell,
you would feed your articles where they would be broken down to
individual words and indexed (see Lucene for more information). Then
cosine similarity (more on in one of my future posts!!!) will be used to
determine how "close" one document relates to each other. Based on that
clusters would be formed where related articles will be grouped
together and unrelated will be separated... This way you could easily
tell if two articles are related or totally different.
Other examples of ML include:
-
Targeted retention - It would cost companies a lot of money to try to
offer all of their customers incentive to stay. Instead companies try to
determine which of their customers are most likely to leave and offer
them a great deal in order to retain them.
- Recommendation Engines - Ever bought anything on Amazon? Then I am sure you saw a list of suggested recommendations :)
-
Sentiment Analysis - Is it good or bad? A quick example, using Natural
Language Processing (NLP) we can extract and process text to determine
if it contains a lot of "positive" words such as 'good', 'awesome', etc
and that's us just scratching a surface of endless possibilities of ML!
No comments:
Post a Comment