This is going to be my first post (of many, I hope!!!) where I would
discuss my recent projects dealing with machine learning (ML) and what I
learned from them. I hope to benefit people who are trying to
understand topics of ML and to make notes for my own future reference.
First
of all, why do we need ML in the first place? Well, ML we can be broken
into two categories "Supervised Learning" (SL) and "Unsupervised
Learning" (UL).
SL in my opinion is more widely used
and presents more practical applications versus UL. Let's consider few
examples, SL can be further broken down to "Classifiers" and
"Regression". As the name may suggest, classifiers try to predict type
or class of the outcome based on historical data. For example, given
number of emails some of which were classified as spam and some of which
were flagged to be not spam, we can build a classifier that will try to
automatically filter incoming mail as spam or not spam for us. You can
see this in your Outlook or Gmail! (Ummm, I feel that I need to cover
spam filtering in bit more details in the future. Stay tuned.)
Classifiers
try to predict more or less yes/no answer. Is it a spam or not? Is it a
car or not? Is it fraudulent transaction or not? Regression on other
hand tries to predict certain value based on known parameters. One
example is house cost prediction: given parameters of the house such as
sq ft, number of bedrooms and bathrooms, etc, time of year and its
location, one can try to predict its cost at a certain time in the
future.
UL - since outputs are unknown you can not
really train your algorithms to predict since you don't know what it is.
One of the classical uses of UL is "Clustering". For example, taking
few specific properties of a subject plot them on a "map" to see if
clusters would be formed, which would help you to determine relationship
between your input data. For example, if you have hundreds of articles
and what to know which ones are related to each other, you can read all
of them and to make your decision OR you can use ML! In the nutshell,
you would feed your articles where they would be broken down to
individual words and indexed (see Lucene for more information). Then
cosine similarity (more on in one of my future posts!!!) will be used to
determine how "close" one document relates to each other. Based on that
clusters would be formed where related articles will be grouped
together and unrelated will be separated... This way you could easily
tell if two articles are related or totally different.
Other examples of ML include:
-
Targeted retention - It would cost companies a lot of money to try to
offer all of their customers incentive to stay. Instead companies try to
determine which of their customers are most likely to leave and offer
them a great deal in order to retain them.
- Recommendation Engines - Ever bought anything on Amazon? Then I am sure you saw a list of suggested recommendations :)
-
Sentiment Analysis - Is it good or bad? A quick example, using Natural
Language Processing (NLP) we can extract and process text to determine
if it contains a lot of "positive" words such as 'good', 'awesome', etc
and that's us just scratching a surface of endless possibilities of ML!
Sunday, July 5, 2015
Saturday, July 4, 2015
ElasticSeach JAVA API to find aliases given index
Recently I posted on stackoverflow on how to find aliases for the given index using JAVA API, so I thought I would like to it from here as well.
http://stackoverflow.com/questions/31170105/elasticseach-java-api-to-find-aliases-given-index
How to find aliases for given index in ElasticSearch using Java?
By using REST API it is pretty easy
https://www.elastic.co/guide/en/elasticsearch/reference/1.x/indices-aliases.html#alias-retrieving
http://stackoverflow.com/questions/31170105/elasticseach-java-api-to-find-aliases-given-index
How to find aliases for given index in ElasticSearch using Java?
By using REST API it is pretty easy
https://www.elastic.co/guide/en/elasticsearch/reference/1.x/indices-aliases.html#alias-retrieving
While working with ElasticSearch, I ran into an issue where I needed to get a list of aliases based on provided index.
While getting a list of aliases is pretty straightforward:
While getting a list of aliases is pretty straightforward:
client.admin().cluster()
.prepareState().execute()
.actionGet().getState()
.getMetaData().aliases();
While working with ElasticSearch, I ran into an issue where I needed to get a list of aliases based on provided index.
While getting a list of aliases is pretty straightforward:
My first implementation looked something like this:
This is what I end up with:
Hope it helps someone!
While getting a list of aliases is pretty straightforward:
client.admin().cluster()
.prepareState().execute()
.actionGet().getState()
.getMetaData().aliases();
I struggled to find an easy way to be able to get aliases for given index without having to iterate through everything first.My first implementation looked something like this:
ImmutableOpenMap<String, ImmutableOpenMap<String, AliasMetaData>> aliases = client.admin().cluster()
.prepareState().execute()
.actionGet().getState()
.getMetaData().aliases();
for (ObjectCursor<String> key: aliases.keys()) {
ImmutableOpenMap<String, AliasMetaData> indexToAliasesMap = client.admin().cluster()
.state(Requests.clusterStateRequest())
.actionGet().getState()
.getMetaData().aliases().get(key.value);
if(indexToAliasesMap != null && !indexToAliasesMap.isEmpty()){
String index= indexToAliasesMap.keys().iterator().next().value;
String alias = indexToAliasesMap.values().iterator().next().value.alias();
}
}
I did not like it... and after poking around, I was able to get an
idea on how to do it more efficiently by looking at
RestGetIndicesAliasesAction (package
org.elasticsearch.rest.action.admin.indices.alias.get)This is what I end up with:
ClusterStateRequest clusterStateRequest = Requests.clusterStateRequest()
.routingTable(false)
.nodes(false)
.indices("your_index_name_goes_here");
ObjectLookupContainer<String> setAliases= client
.admin().cluster().state(clusterStateRequest)
.actionGet().getState().getMetaData()
.aliases().keys();
You will be able to find aliases for the index that you specified in setAliasesHope it helps someone!
Subscribe to:
Posts (Atom)