Cloud. Big Data. Analytics... and so on: March 2014

Wednesday, March 19, 2014

Installing Accumulo

We built Accumulo from the source code in our previous post, let’s now use it to install Accumulo locally.

First copy it over, extract it and then change permissions
[root@localhost local]# cp /opt/accumulo/source/accumulo/assemble/target/accumulo-1.6.0-SNAPSHOT-bin.tar.gz /usr/local/
[root@localhost local]# tar xzf accumulo-1.6.0-SNAPSHOT-bin.tar.gz
[root@localhost local]# mv accumulo-1.6.0-SNAPSHOT/ accumulo
[root@localhost local]# chown -R hduser:hadoop accumulo

Since we are going to be using Accumulo in our development environment mainly to fix bugs or to add features, we are going to use 512MB instance of Accumulo that is found inside of /accumulo/conf/examples. Other samples include 1GB, 2GB, and 3GB, which are overkill in our case.

Now, copy it over as hduser into your main conf directory
[root@localhost ~]# su – hduser
[hduser@localhost examples]$ cp /usr/local/accumulo/conf/examples/512MB/standalone/* /usr/local/accumulo/conf
[hduser@localhost conf]$ ll
total 64
-rwxr-xr-x. 1 hduser hduser 3101 Mar 3 23:48 accumulo-env.sh
-rw-r--r--. 1 hduser hduser 2180 Mar 3 23:48 accumulo-metrics.xml
-rw-r--r--. 1 hduser hadoop 7995 Mar 3 21:54 accumulo.policy.example
-rw-r--r--. 1 hduser hduser 4240 Mar 3 23:48 accumulo-site.xml
-rw-r--r--. 1 hduser hduser 1673 Mar 3 23:48 auditLog.xml
drwxr-xr-x. 8 hduser hadoop 4096 Mar 3 21:54 examples
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 gc
-rw-r--r--. 1 hduser hduser 3613 Mar 3 23:48 generic_logger.xml
-rw-r--r--. 1 hduser hduser 1713 Mar 3 23:48 log4j.properties
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 masters
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 monitor
-rw-r--r--. 1 hduser hduser 2924 Mar 3 23:48 monitor_logger.xml
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 slaves
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 tracers

Now that we have all configuration files in the right place, go ahead and specify location of your JAVA_HOME, ZOOKEEPER_HOME and HADOOP_HOME in accumulo-env.sh file located in /usr/local/accumulo/conf.

Before starting Accumulo for the very first time, you would have to run initialization script to create HDFS directory structure and ZooKeeper settings.
[hduser@localhost ~]$ /usr/local/accumulo/bin/accumulo init
OpenJDK 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.2.0/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2014-03-04 00:40:01,555 [util.NativeCodeLoader] WARN : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-03-04 00:40:03,065 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on system reset or power loss
2014-03-04 00:40:03,065 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on system reset or power loss
2014-03-04 00:40:03,066 [init.Initialize] INFO : Hadoop Filesystem is hdfs://localhost:9000
2014-03-04 00:40:03,067 [init.Initialize] INFO : Accumulo data dirs are [hdfs://localhost:9000/accumulo]
2014-03-04 00:40:03,067 [init.Initialize] INFO : Zookeeper server is localhost:2181
2014-03-04 00:40:03,067 [init.Initialize] INFO : Checking if Zookeeper is available. If this hangs, then you need to make sure zookeeper is running

Warning!!! Your instance secret is still set to the default, this is not secure. We highly recommend you change it.
You can change the instance secret in accumulo by using:
bin/accumulo org.apache.accumulo.server.util.ChangeSecret oldPassword newPassword.
You will also need to edit your secret in your configuration file by adding the property instance.secret to your conf/accumulo-site.xml. Without this accumulo will not operate correctlyInstance name : accumulo-demo
Enter initial password for root (this may not be applicable for your security setup): ***********
Confirm initial password for root: ***********
2014-03-04 00:41:11,785 [Configuration.deprecation] INFO : dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
2014-03-04 00:41:12,239 [Configuration.deprecation] INFO : dfs.block.size is deprecated. Instead, use dfs.blocksize
2014-03-04 00:41:15,884 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthorizor
2014-03-04 00:41:15,887 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthenticator
2014-03-04 00:41:15,890 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKPermHandler
Now you are ready to start accumulo!
[hduser@localhost ~]$ /usr/local/accumulo/bin/start-all.sh
WARN : Using Zookeeper /usr/local/zookeeper/zookeeper-3.4.6. Use version 3.3.0 or greater to avoid zookeeper deadlock bug.
Starting monitor on localhost
WARN : Max files open on localhost is 1024, recommend 65536
Starting tablet servers .... done
Starting tablet server on localhost
WARN : Max files open on localhost is 1024, recommend 65536
OpenJDK 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.2.0/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2014-03-04 00:42:29,070 [util.NativeCodeLoader] WARN : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-03-04 00:42:31,922 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on system reset or power loss
2014-03-04 00:42:31,922 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on system reset or power loss
2014-03-04 00:42:31,934 [server.Accumulo] INFO : Attempting to talk to zookeeper
2014-03-04 00:42:33,444 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping to talk to HDFS
2014-03-04 00:42:33,654 [server.Accumulo] INFO : Connected to HDFS
Starting master on localhost
WARN : Max files open on localhost is 1024, recommend 65536
Starting garbage collector on localhost
WARN : Max files open on localhost is 1024, recommend 65536
Starting tracer on localhost
WARN : Max files open on localhost is 1024, recommend 65536

To confirm that it is up and running, view the Accumulo Overview page at: http://localhost:50095

I had ThriftSecurityException(user:root, code:BAD_CREDENTIALS) error under Recent Logs and I fixed it by providing root password in accumulo-site.xml trace.token.property.password. By default it is set to “secret”. However, it is best to make a special user and password for tracing and configure it.

To finish off, let’s create ACCUMULO_HOME like we did for ZooKeeper, Java, etc.

$ sudo vim /etc/profile.d/accumulo.sh

export ACCUMULO_HOME=/usr/local/accumulo/

Tuesday, March 18, 2014

Installing Zookeeper

Think of ZooKeeper as coordination service for cluster environment managing distributed processes.

ZooKeeper ensures:
-    Consistency
-    Atomicity
-    Reliability
-    Timeliness

In order to install ZooKeeper, you need to download stable release from its official site, extract it and make sure that our Hadoop user can access it.
[root@localhost local]# cd /usr/local/
[root@localhost local]# wget http://apache.osuosl.org/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
--2014-03-03 20:25:54-- http://apache.osuosl.org/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
Resolving apache.osuosl.org... 140.211.166.134
Connecting to apache.osuosl.org|140.211.166.134|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17699306 (17M) [application/x-gzip]
Saving to: “zookeeper-3.4.6.tar.gz”

100%[======================>] 17,699,306   280K/s   in 45s

2014-03-03 20:26:41 (387 KB/s) - “zookeeper-3.4.6.tar.gz” saved [17699306/17699306]

[root@localhost local]# tar xzf zookeeper-3.4.6.tar.gz
[root@localhost local]# mv zookeeper-3.4.6 zookeeper
[root@localhost local]# chown -R hduser:hadoop zookeeper
[root@localhost local]# ll
….
drwxr-xr-x. 10 hduser hadoop     4096 Feb 20 05:58 zookeeper
…..

Configuring ZooKeeper

Simply copy over sample config file to zoo.cfg
[root@localhost local]# cat /usr/local/zookeeper/conf/zoo_sample.cfg >> /usr/local/zookeeper/conf/zoo.cfg

It should look like
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

Notice that ZooKeeper needs temporary directory to store its files
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/tmp/zookeeper

Let’s change dataDir to point to /app/zookeeper and make sure that we have this directory with the right access rights.
[root@localhost app]# mkdir -p /app/zookeeper
[root@localhost app]# chown hduser:hadoop /app/zookeeper/

Now let’s start ZooKeeper
[hduser@localhost tmp]$ /usr/local/zookeeper/bin/zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

And verify that it is indeed running
[hduser@localhost tmp]$ /usr/local/zookeeper/bin/zkCli.sh
Connecting to localhost:2181
2014-03-03 20:53:21,869 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2014-03-03 20:53:21,881 [myid:] - INFO [main:Environment@100] - Client environment:host.name=localhost.localdomain
2014-03-03 20:53:21,882 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_51
2014-03-03 20:53:21,885 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2014-03-03 20:53:21,885 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51.x86_64/jre
2014-03-03 20:53:21,886 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/local/zookeeper/bin/../build/classes:/usr/local/zookeeper/bin/../build/lib/*.jar:/usr/local/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/local/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/local/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper/bin/../zookeeper-3.4.6.jar:/usr/local/zookeeper/bin/../src/java/lib/*.jar:/usr/local/zookeeper/bin/../conf:
2014-03-03 20:53:21,886 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-03-03 20:53:21,886 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2014-03-03 20:53:21,886 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2014-03-03 20:53:21,886 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2014-03-03 20:53:21,886 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2014-03-03 20:53:21,886 [myid:] - INFO [main:Environment@100] - Client environment:os.version=2.6.32-431.5.1.el6.x86_64
2014-03-03 20:53:21,886 [myid:] - INFO [main:Environment@100] - Client environment:user.name=hduser
2014-03-03 20:53:21,886 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/hduser
2014-03-03 20:53:21,887 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/tmp
2014-03-03 20:53:21,891 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@1d672476
Welcome to ZooKeeper!
2014-03-03 20:53:22,255 [myid:] - INFO [main-SendThread(localhost.localdomain:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2014-03-03 20:53:22,293 [myid:] - INFO [main-SendThread(localhost.localdomain:2181):ClientCnxn$SendThread@852] - Socket connection established to localhost.localdomain/127.0.0.1:2181, initiating session
[zk: localhost:2181(CONNECTING) 0] 2014-03-03 20:53:22,448 [myid:] - INFO [main-SendThread(localhost.localdomain:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x1448aca9c6e0000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

Now let’s set up ZOOKEEPER_HOME
$ sudo vim /etc/profile.d/zookeeper.sh
export ZOOKEEPER_HOME= /usr/local/zookeeper/

Done!

Sunday, March 16, 2014

Building your first Accumulo tar file

The goal of this tutorial is for you to be able to run Accumulo in your environment and to be able to contribute to it. In order to do that you need to know how to build Accumulo tar file from the source code. So let's get to it:

Create a directory where you want to store Accumulo source code and use git clone command to pull the latest and greatest. For example,
$ sudo mkdir /opt/accumulo/source
$ git clone https://git-wip-us.apache.org/repos/asf/accumulo.git
(please refer to this page for up to date link to the Accumulo git repository, under Source Code)

Navigate to Accumulo source code directory and review its contents files via your favorite IDE.

When you are ready to build your Accumulo tar file, execute following commands

$ cd path/to/accumulo
$ mvn package -P assemble

Depending on your machine it can take some time, but eventually you shall see something like this if everything went well.

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Accumulo ................................... SUCCESS [01:00 min]
[INFO] Trace ............................................. SUCCESS [ 16.735 s]
[INFO] Fate .............................................. SUCCESS [ 5.040 s]
[INFO] Start ............................................. SUCCESS [01:47 min]
[INFO] Core .............................................. SUCCESS [02:03 min]
[INFO] Simple Examples ................................... SUCCESS [ 17.305 s]
[INFO] Server Base ....................................... SUCCESS [ 16.004 s]
[INFO] GC Server ......................................... SUCCESS [ 3.220 s]
[INFO] Master Server ..................................... SUCCESS [ 7.238 s]
[INFO] Tablet Server ..................................... SUCCESS [ 17.611 s]
[INFO] MiniCluster ....................................... SUCCESS [01:38 min]
[INFO] Monitor Server .................................... SUCCESS [ 4.857 s]
[INFO] Native Libraries .................................. SUCCESS [ 13.424 s]
[INFO] Tracer Server ..................................... SUCCESS [ 1.992 s]
[INFO] Accumulo Maven Plugin ............................. SUCCESS [ 25.701 s]
[INFO] Testing ........................................... SUCCESS [01:03 min]
[INFO] Proxy ............................................. SUCCESS [ 28.249 s]
[INFO] Assemblies ........................................ SUCCESS [ 11.505 s]
[INFO] Documentation ..................................... SUCCESS [ 0.126 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:26 min
[INFO] Finished at: 2014-03-01T02:40:37-05:00
[INFO] Final Memory: 95M/276M
[INFO] ------------------------------------------------------------------------

Congrats! You just built your first Accumulo tar file that is ready to be deployed! Navigate to assemble/target/ directory to find your file. It should look something like this: accumulo-1.6.0-bin.tar.gz

Now let's see how we can pull code from different branches and see if there were any updates. Switch to you accumulo directory and check if you have anything new to commit

[root@localhost accumulo]# git status
# On branch master
nothing to commit (working directory clean)

Now let's see if there were any new updates to the code by someone else since your last check out

[root@localhost accumulo]# git pull
Already up-to-date.

Seems like everything is up to date... If you experience issues with git pull command, make sure that remote site is linked properly so git knows where to pull the data from

[root@localhost accumulo]# git remote add origin https://git-wip-us.apache.org/repos/asf/accumulo.git

Now, you probably don't want to make your changes against master branch... In order to switch branches execute

[root@localhost accumulo]# git pull origin 1.6.0-SNAPSHOT

This should give you 1.6.0 development branch of Accumulo

To find out entire list of available ranches

[root@localhost assemble]# git branch -a
1.6.0-SNAPSHOT
* master
remotes/origin/1.4.5-SNAPSHOT
remotes/origin/1.5.2-SNAPSHOT
remotes/origin/1.6.0-SNAPSHOT
remotes/origin/ACCUMULO-1000
remotes/origin/ACCUMULO-1409
remotes/origin/ACCUMULO-1566
remotes/origin/ACCUMULO-2061
remotes/origin/ACCUMULO-2442
remotes/origin/ACCUMULO-578
remotes/origin/ACCUMULO-652
remotes/origin/ACCUMULO-672
remotes/origin/ACCUMULO-722
remotes/origin/ACCUMULO-802
remotes/origin/ACCUMULO-CURATOR
remotes/origin/HEAD -> origin/master
remotes/origin/master
-a shows all local and remote branches.

To create your own branch and switched to it at the same time, execute

[root@localhost assemble]# git checkout -b 1.6.0-SNAPSHOT-myown

This will come in handy when we are ready to contribute, which involve switching between different branches, creating our own feature branches and then committing it back.

Friday, March 14, 2014

Installation of required basic tools on CentOS

Install Java (Make sure to install Java Development Environment):

$ sudo yum install java-1.7.0-openjdk-devel –y

Set up JAVA_HOME environment variable:
$ sudo vim /etc/profile.d/java.sh
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51.x86_64/

Verify
$ java -version
java version "1.7.0_51"
OpenJDK Runtime Environment (rhel-2.4.4.1.el6_5-x86_64 u51-b02)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)

Add java to PATH
export PATH=$PATH:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51.x86_64/bin

Verify
echo $PATH
/usr/local/maven/bin:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51.x86_64//bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:/root/bin

Install Maven (we will need that to build our Accumulo)

Create maven directory to keep things neat:
$ sudo mkdir /opt/maven
$ cd /opt/maven

Download the latest Maven binary from the official site: http://maven.apache.org/download.cgi
$ wget http://www.carfab.com/apachesoftware/maven/maven-3/3.2.1/binaries/apache-maven-3.2.1-bin.tar.gz
$ sudo tar xzf apache-maven-3.2.1-bin.tar.gz -C /usr/local
$ cd /usr/local
$ sudo ln -s apache-maven-3.2.1 maven

Set up M2_HOME and add maven to your path
$ sudo vi /etc/profile.d/maven.sh
export M2_HOME=/usr/local/maven
export PATH=${M2_HOME}/bin:${PATH}

Finally, log out and log in again to activate the above environment variables.

To verify successful installation of maven, check the version of maven:
$ mvn -version

Note: If you are using maven behind proxy, you need to configure $ vim ~/.m2/settings.xml
<settings>
<proxies>
    <proxy>
      <active>true</active>
      <protocol>http</protocol>
      <host>proxy.host.com</host>
      <port>port_number</port>
      <username>proxy_user</username>
      <password>proxy_user_password</password>
      <nonProxyHosts>www.google.com</nonProxyHosts>
    </proxy>
</proxies>
</settings>

Install git to be able to pull/push from Accumulo repository

$ sudo yum install git

Configure git

$ git config --global user.name "Your Name Here"
$ git config --global user.email "your_email@example.com"

To verify the changes $ vim ~/.gitconfig
You should see something like
[user]
name = Your Name Here
email = your_email@example.com

Thursday, March 13, 2014

Setting up CentOS6 environment in VMPlayer

First download CentOS 6 from one of official CentOS sites and save it anywhere locally on your machine.

Start VMPlayer and click on "Create a New Virtual Machine"

Select "I will install the operating system later"

Select Linux as Guest operating system and CentOS 64-bit as Version

Name your virtual machine and specify where you want it stored on your local drive.

Specify preferred disk size and whether you want to split your virtual disk. I picked default values

Select Next and Click on Customize Hardware

We are going to need more than 1 GB of memory. Depending of what you have available to you bump it up at least to 2 GB. I used half of my memory in this case.

Under New CD/DVD. Pick Use ISO image file for your connection and point to the ISO file that you previously downloaded.

Under Network Adapter, Select "NAT: Used to share the host's IP address" if you don't care if your VM were to share your host's IP address. If you rather have your virtual machine behave more like a real computer on your network, change it to the"Bridged: Connect directly to the physical network". For this tutorial, I went with NAT.

Keep rest of the settings default and click the Close and then Finish button.

You should now see your newly created VM. Select it from the list of available VMs and click on Play virtual machine. VM will now install CentOS

At this point, we are going to proceed to install the OS to Hard Drive. Why? Well, it will enable us to have CDROM available to our CentOS system if we need it. For example, in our next steps we are going to install VMware Tools that require us to have CDROM available to it. VMware Tools will give us many more features, and will improve mouse movements, video and performance. One of the biggest improvements for me is ability to expand the display into full screen more when you hit "Enter full screen mode". Otherwise, the VMware Player window itself will expand and you still end up with small VM display. Go ahead, try it and see what I mean.

Double click "Install to Hard Drive" icon on your CentOS desktop. This should take you through set up wizard where you sit your keyboard, storage, name of hostname (pick something that would sense to you when you later see it on the network), time zone, etc. Follow it and pick what you fancy... I pretty much picked the default values. Few tips:
- Go ahead and pick "Yes, discard any data" when you see Storage Device Warning message. No worries, it won't do anything to your actual harddrive, only to VM.
- Make sure you create root password AND remember it!!! We will need root access to install/configure in the future.

test

- When asked which type on installation would you like. Select Use All Space. Once again, this is happening inside of the VM, so use all 20GB space that you allocated to it. When warming message of Writing storage configuration to disk pops up, go ahead and select Write changes to disk. This will format CentOS harddrive and will install all necessary files.

This step will take a while... and finally you will see:

Go ahead and restart...

When screen comes up, go through License Information, Create non-admin user, Pick date and time, Enable/or disable Kdump. The system will ask you if you wish to reboot, go ahead and do that.

Now let's change our start up configuration and now that CentOS is installed on actual harddrive, we no longer need to boot up from the image. Go ahead and alter that by first shutting down your VM, selecting Edit virtual machine settings, navigating to CD/DVD and picking use physical drive.

Now go ahead and start your new VM and log in as root.

Initiate VM tools install by clicking on Install Tools on the bottom right corner of the VMware Player or by selecting Player > Manage > Install VM Tools...

Fire up, terminal and execute following commands to install VM Tools

mkdir /mnt/cdrom
mount /dev/cdrom /mnt/cdrom
cp /mnt/cdrom/VMwareTools-*.tar.gz /tmp
umount /mnt/cdrom
tar -zxf /tmp/VMwareTools-*.tar.gz -C /tmp
cd /
./tmp/vmware-tools-distrib/vmware-install.pl --default

Restart your VM again for changes to take effect.

Congrats! You have successfully installed CentOS in your VM! At this point, back it up so that if you were to decide to start fresh you don't have to redo the entire thing again.

Puppet files and git.

If you have puppet files just sitting around on the server and random people changing manifests or uploading files without any type of coordination, that can lead to all kinds of issues. So let's go ahead and bring some order to it.

First, install git on the puppet server
sudo yum install git
Navigate to the directory where we have our puppet files.
cd /etc/puppet/

Initialize git there
git init

Add and commit all files
git add .
git commit -m "Initial commit"

Create bare repository to make it easier for people to clone new puppet repo
git clone --bare . ../puppet-repo.git

Add a pointer to your new bare repo using the origin name
git remote add origin ../puppet-repo.git/

Now open a command prompt on your local machine and execute following command to clone aka get puppet files on your machine. (Before that I created a directory to hold the data)
git clone user_name@host_name_or_ip:/location/of/puppet-repo.git

You will be asked if you want to connect, type 'yes' and provide your password. After which you can monitor the progress of your clone process... Might take a while depending on your network connection and data amount.

Now to avoid making all changes in the notepad, you can install Geppetto, which is A Puppet IDE and availabe as Eclipse plugin. The installation of Geppetto is quite straightforward. If you have an existing Eclipse, its easy to load up Geppetto within it:
•Help —> Install new Software
•Add —> Location = http://download.cloudsmith.com/geppetto/updates
•Select “Geppetto” from the list of potential downloads
•Finish
•Accept any Licenses
•Reload Eclipse

At the time of this writing, there is no option to import already existing Puppet project into Geppetto, so you have two options. Create and tweak .project file in your puppet project folder and then import existing project or you will need to import the project as a 'file system'. Geppetto will then create the xml file for you. I chose to import it as a file system...

Go ahead and play around with your modules and manifests...

Now that you made your changes go ahead and add them to the main repo.
Run git status to verify your changes, run git add . to stage them, and finally run git commit -a to commit. Now to add that to the main repo run git push

If you are getting an exception stating that "insufficient permission for adding an obkect to repository database", that means that you need to make further changes on the main server. Nothing big, just need to make sure that you have read/write set up right.
ssh to server
cd puppet-repo.git
sudo chmod -R g+ws *
sudo chgrp -R mygroup * ---> to find your group run: groups <username>

Try git push again. It should work.

Now that you successfully pushed your updates, go ahead and pull them on the server.

ssh to server
cd to the location of you puppet files and execute
git pull to get the updates.

You shall see the changes made...

This is a two way street. If you must make changes on the server, you can. You just need to commit and push it for others to see your changes.

Now this won't stop someone from modifying the files directly on the server and not committing them, but hey at least it is a start and you can remove the access of average user to this and force people to use git push to commit their changes.

No more Wild West!

Monday, March 3, 2014

How to install Accumulo on Centos 6 (64-bit) from scratch...

I wanted to install Apache Accumulo on my Windows laptop so that I can experiment with it and contribute to it.

If you plan to compile and run Accumulo, Windows is not a good option. If you don't have dedicated Linux box, you can download either VM Player or VirtualBox and your flavor of Linux ISO. I picked VMPlayer and CentOS 6.

I am going to provide you with step by step guidelines on how to set up your very own Apache Accumulo instance.

I am going to break the tutorial in several pieces:

Setting up CentOS6 environment in VMPlayer

Building ready to deploy Apache Accumulo war file from the source.
Installing Hadoop
Installing Zookeeper
Running compiled Apache Accumulo war file on your new system