Saturday, August 16, 2014

How to create SolrCloud Instance.

In this post, I decided to talk about how to quickly get SolrCloud instance up and running on your local box.

First of all, make sure that you have Java 6+ installed on your system.
$ java -version
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)

Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

Download most recent version of Solr from http://lucene.apache.org/solr/. If you are on Unix, Linux or Mac OS, grab tgz file. Download it somewhere where you would have permissions to untar it and to run it. I downloaded and decompressed it under my home directory in /opt/solr by running:

tar solr-<version>.tgz

In this example, I am going to be using default jetty server, which was proven to scale even on large production systems and since I don't want to further complicate this example. If you were to google "sold jetty vs tomcat", you would see tons of opinions out there on which one to use. In my opinion, use whatever makes more sense in your case.

To run very basic Solr instance, all you have to do is to navigate to example folder and to run start.jar like so

$ cd opt/solr/example/
$ java -jar start.jar

However, we are interested in SolrCloud Instance in this case... so let's create two shards with replication of 2. For detailed discussion of shards, replication and SolrCloud in general please visit SolrCloud wiki.

Copy example directory and name it node1. After that rename collection name to something more useful, like wikipedia. Remove any data that might be there. And do some autodiscovery magic.

$ cp -r example/ node1/
$ cd node1
$ cp -r solr/collection1/ solr/wikipedia
$ rm -rf solr/wikipedia/data/
$ find . -name "core.properties" -type f -exec rm {} \;
$ echo "name=wikipedia" > solr/wikipedia/core.properties

Now you are ready to start your first node in your SolrCloud
$ java -Dcollection.configName=wikipedia -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/wikipedia/conf/ -jar start.jar 



Now let's create 3 more nodes and start them to complete our SolrCloud Instance

$ cp -r node1/ node2/
$ cd node2/
$ rm -rf solr/wikipedia/conf/
$ java -DzkHost=localhost:9983 -Djetty.port=8984 -jar start.jar 

$ cp -r node1/ node3/
$ cd node3/
$ rm -rf solr/wikipedia/conf/
$ java -DzkHost=localhost:9983 -Djetty.port=8985 -jar start.jar 

$ cp -r node1/ node4/
$ cd node4/
$ rm -rf solr/wikipedia/conf/
$ java -DzkHost=localhost:9983 -Djetty.port=8986 -jar start.jar 


Since all of our configuration files are now managed by Zookeeper, we would need to download them, then modify them and upload them back... yes, I know it is a pain, but this way you don't have to do it manually on each server, just do it once, upload it to Zookeeper and it would take care of the rest for you!

Navigate to /opt/solr/node1/scripts/cloud-scripts directory and run following command

$ ./zkcli.sh -zkhost localhost:9983 -cmd downconfig -confdir /<directory_of_your_choice>/solr_conf -confname wikipedia

Navigate to the directory and you should see all of the configuration files.
$ ls
_schema_analysis_stopwords_english.json elevate.xml solrconfig.xml
_schema_analysis_synonyms_english.json lang spellings.txt
admin-extra.html mapping-FoldToASCII.txt stopwords.txt
admin-extra.menu-bottom.html mapping-ISOLatin1Accent.txt synonyms.txt
admin-extra.menu-top.html protwords.txt update-script.js
clustering schema.xml velocity
currency.xml scripts.conf xslt

Make changes (usually to schema.xmla nd to solrconfig.xml) and upload it back by running similar command.

$ ./zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir /<directory_of_your_choice>/solr_conf -confname wikipedia

That's it!

No comments:

Post a Comment