Cloud. Big Data. Analytics... and so on: accumulo

Showing posts with label accumulo. Show all posts

Wednesday, August 13, 2014

Accumulo: How to creat a new table and set permissions

Creating a new table in Accumulo is pretty easy. It is as simple as

createtable my_new_cool_table

Now let's say that you create this table as a root. How can you check if another user will be able to read or write to this table? Let's say that you have user bob, how can you check what this user sees or can do?

Run following command
userpermissions -u bob

You should see a list of tables and current user authorization on a particular table

userpermissions -u bob
System permissions: System.CREATE_TABLE, System.DROP_TABLE, System.SYSTEM
Table permissions (!METADATA): Table.READ
Table permissions (META): Table.READ, Table.WRITE

The new table is not in the list since user bob can't do anything with the table that was created by root. Let's change that! In order for user bob to be able to read from new table, execute this command

grant Table.READ -t my_new_cool_table -u bob

If you were to re-execute userpermissions command, you would see

userpermissions -u bob
System permissions: System.CREATE_TABLE, System.DROP_TABLE, System.SYSTEM
Table permissions (!METADATA): Table.READ
Table permissions (META): Table.READ, Table.WRITE
Table permissions (my_new_cool_table): Table.READ

Full list of authorizations:
Table.ALTER_TABLE
Table.BULK_IMPORT
Table.DROP_TABLE
Table.GRANT
Table.READ
Table.WRITE

Sunday, May 18, 2014

Apache Accumulo rFile Generation and Import

I wrote a very quick project to demonstrate how to generate Apache Accumulo rFile in HDFS in a location of your choice and then upload it into Apache Accumulo of your choice. Some configuration will be required. The project is still work in progress... but it will give you what you need if you are looking how to generate Accumulo records programmatically, store them in HDFS and later import them into your Accumulo instance.

Source code: https://github.com/krinkere/rFileGenerator

Sunday, April 13, 2014

Connecting to the Accumulo using Java

Now that our Accumulo is set up, let’s connect to it through Java App so we can create our first table and write some data into it.

(I am going to assume that you have maven installed on your machine… If not, see this)

Create simple maven project.
mvn archetype:generate -DgroupId=org.mytest.accexample -DartifactId=simple -Dpackage=org.mytest.accexample -Dversion=1.0-SNAPSHOT

Accept all defaults.

Since we will be connecting to Accumulo, it comes to no surprise that we would need jars to support Accumulo, Zookeeper and Hadoop. So add this to your pom.xml file

  <dependency>
   <groupId>org.apache.accumulo</groupId>
   <artifactId>accumulo-core</artifactId>
   <version>1.4.4</version>
  </dependency>
  <dependency>
   <groupId>org.apache.hadoop</groupId>
   <artifactId>hadoop-core</artifactId>
   <version>1.2.1</version>
  </dependency>
  <dependency>
   <groupId>org.apache.zookeeper</groupId>
   <artifactId>zookeeper</artifactId>
   <version>3.4.5</version>
  </dependency>

Now it is time to write a program to connect to our Accumulo instance and put some records in!

package org.mytest.accexample;

import org.apache.accumulo.core.client.AccumuloException;
import org.apache.accumulo.core.client.AccumuloSecurityException;
import org.apache.accumulo.core.client.BatchWriter;
import org.apache.accumulo.core.client.Connector;
import org.apache.accumulo.core.client.Instance;
import org.apache.accumulo.core.client.TableExistsException;
import org.apache.accumulo.core.client.TableNotFoundException;
import org.apache.accumulo.core.client.ZooKeeperInstance;
import org.apache.accumulo.core.client.admin.TableOperations;
import org.apache.accumulo.core.data.Mutation;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.security.ColumnVisibility;
import org.apache.hadoop.io.Text;

public class App {

public static void main(String[] args) throws AccumuloException, AccumuloSecurityException,TableNotFoundException, TableExistsException {
        // Constants
        String instanceName = "default";
        String zooServers = ""; // Provide list of zookeeper server here. In our case, we had just one so localhost:2181 should do
        String userName = ""; // Provide username
        String password = ""; // Provide password
        // Connect
        Instance inst = new ZooKeeperInstance(instanceName,zooServers);
        Connector conn = inst.getConnector(userName, password);
       // Let’s create our new table
        String tableName = "myTable";
        TableOperations ops = conn.tableOperations();
         if (ops.exists(tableName)) {
            ops.delete(tableName);
        }
        ops.create(tableName);
        // Use batch writer to write demo data
        BatchWriter bw = conn.createBatchWriter(tableName,1000000, 60000, 2);
        // set values
        Text rowID = new Text("row1");
        Text colFam = new Text("colFam");
        Text colQual = new Text("colQual");
        // set value
        Value value = new Value("some-value".getBytes());
        // create new mutation and add rowID, colFam, colQual, and value
        Mutation mutation = new Mutation(rowID);
        mutation.put(colFam, colQual, value);
        // add the mutation to the batch writer
        bw.addMutation(mutation);
        // close the batch writer
        bw.close();
    }
}

Run it… Congratulations you just wrote your first record into Accumulo! Now let’s view it in the table just to make sure it is indeed there.

Let’s take a different approach and access Accumulo through bash shell.
$ACCUMULO_HOME/bin/accumulo shell -u [username]

Then provide your password.

Type ‘tables’ to see a list of available tables. You should see myTable in the list.
Type ‘table myTable’ to access it. Now execute simple scan to view your records.
root@default myTable> table myTable
root@default myTable> scan
row1 colFam:colQual []    some-value

Success!

Now, what’s that security level stuff in Accumulo everyone is talking about? Well, it is a way to ensure that only rows that fit your authorization will be shown to you. That’s it in the nutshell, for more in details explanation please refer to this.

Let’s add this to our Java App Accumulo writer. Add ColumnVisibility variable and add it to the mutation object. Here is the complete example with changes highlighted in red:

package org.mytest.accexample;

import org.apache.accumulo.core.client.AccumuloException;
import org.apache.accumulo.core.client.AccumuloSecurityException;
import org.apache.accumulo.core.client.BatchWriter;
import org.apache.accumulo.core.client.Connector;
import org.apache.accumulo.core.client.Instance;
import org.apache.accumulo.core.client.TableExistsException;
import org.apache.accumulo.core.client.TableNotFoundException;
import org.apache.accumulo.core.client.ZooKeeperInstance;
import org.apache.accumulo.core.client.admin.TableOperations;
import org.apache.accumulo.core.data.Mutation;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.security.ColumnVisibility;
import org.apache.hadoop.io.Text;

public class App {

public static void main(String[] args) throws AccumuloException, AccumuloSecurityException,TableNotFoundException, TableExistsException {
        // Constants
        String instanceName = "default";
        String zooServers = "";
        String userName = "";
        String password = "";
        // Connect
        Instance inst = new ZooKeeperInstance(instanceName,zooServers);
        Connector conn = inst.getConnector(userName, password);
        String tableName = "myTable";
        TableOperations ops = conn.tableOperations();
         if (ops.exists(tableName)) {
            ops.delete(tableName);
        }
        ops.create(tableName);
        // Use batch writer to write demo data
        BatchWriter bw = conn.createBatchWriter(tableName,1000000, 60000, 2);
        // set values
        Text rowID = new Text("row1");
        Text colFam = new Text("colFam");
        Text colQual = new Text("colQual");
        // set visibility
        ColumnVisibility colVis = new ColumnVisibility("public");
        // set value
        Value value = new Value("some-value".getBytes());
        // create new mutation and add rowID, colFam, colQual, and value
        Mutation mutation = new Mutation(rowID);
        mutation.put(colFam, colQual, colVis, value);
        // add the mutation to the batch writer
        bw.addMutation(mutation);
        // close the batch writer
        bw.close();
    }
}

Run the example again… Now log into Accumulo shell and scan the myTable. You won’t see anything! What?

Well, you just wrote a record that would be shown only to users with authorization to view ‘public’ rows. What can you view right now for your user? Run ‘getauths’ command in the shell to find out.

‘public’ auth was probably not in the list, hence inability to see the freshly inserted record.

Let’s add ‘public’ to the list of authorizations. Run following command from the bash
root@default myTable> setauths -s public

Now run the scan and you should be able to see the row. Notice ‘public’ in square brackets
root@default myTable> scan
row1 colFam:colQual [public]    some-value

In summary, we were able to write a java client that would communicate with Accumulo instance and would insert a record. We used Accumulo bash shell to view the record. We also briefly touched on cell level security.

Reference:
1. Accumulo Shell Command Guide.
2. Accumulo Visibility.

Wednesday, March 19, 2014

Installing Accumulo

We built Accumulo from the source code in our previous post, let’s now use it to install Accumulo locally.

First copy it over, extract it and then change permissions
[root@localhost local]# cp /opt/accumulo/source/accumulo/assemble/target/accumulo-1.6.0-SNAPSHOT-bin.tar.gz /usr/local/
[root@localhost local]# tar xzf accumulo-1.6.0-SNAPSHOT-bin.tar.gz
[root@localhost local]# mv accumulo-1.6.0-SNAPSHOT/ accumulo
[root@localhost local]# chown -R hduser:hadoop accumulo

Since we are going to be using Accumulo in our development environment mainly to fix bugs or to add features, we are going to use 512MB instance of Accumulo that is found inside of /accumulo/conf/examples. Other samples include 1GB, 2GB, and 3GB, which are overkill in our case.

Now, copy it over as hduser into your main conf directory
[root@localhost ~]# su – hduser
[hduser@localhost examples]$ cp /usr/local/accumulo/conf/examples/512MB/standalone/* /usr/local/accumulo/conf
[hduser@localhost conf]$ ll
total 64
-rwxr-xr-x. 1 hduser hduser 3101 Mar 3 23:48 accumulo-env.sh
-rw-r--r--. 1 hduser hduser 2180 Mar 3 23:48 accumulo-metrics.xml
-rw-r--r--. 1 hduser hadoop 7995 Mar 3 21:54 accumulo.policy.example
-rw-r--r--. 1 hduser hduser 4240 Mar 3 23:48 accumulo-site.xml
-rw-r--r--. 1 hduser hduser 1673 Mar 3 23:48 auditLog.xml
drwxr-xr-x. 8 hduser hadoop 4096 Mar 3 21:54 examples
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 gc
-rw-r--r--. 1 hduser hduser 3613 Mar 3 23:48 generic_logger.xml
-rw-r--r--. 1 hduser hduser 1713 Mar 3 23:48 log4j.properties
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 masters
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 monitor
-rw-r--r--. 1 hduser hduser 2924 Mar 3 23:48 monitor_logger.xml
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 slaves
-rw-r--r--. 1 hduser hduser 792 Mar 3 23:48 tracers

Now that we have all configuration files in the right place, go ahead and specify location of your JAVA_HOME, ZOOKEEPER_HOME and HADOOP_HOME in accumulo-env.sh file located in /usr/local/accumulo/conf.

Before starting Accumulo for the very first time, you would have to run initialization script to create HDFS directory structure and ZooKeeper settings.
[hduser@localhost ~]$ /usr/local/accumulo/bin/accumulo init
OpenJDK 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.2.0/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2014-03-04 00:40:01,555 [util.NativeCodeLoader] WARN : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-03-04 00:40:03,065 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on system reset or power loss
2014-03-04 00:40:03,065 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on system reset or power loss
2014-03-04 00:40:03,066 [init.Initialize] INFO : Hadoop Filesystem is hdfs://localhost:9000
2014-03-04 00:40:03,067 [init.Initialize] INFO : Accumulo data dirs are [hdfs://localhost:9000/accumulo]
2014-03-04 00:40:03,067 [init.Initialize] INFO : Zookeeper server is localhost:2181
2014-03-04 00:40:03,067 [init.Initialize] INFO : Checking if Zookeeper is available. If this hangs, then you need to make sure zookeeper is running

Warning!!! Your instance secret is still set to the default, this is not secure. We highly recommend you change it.
You can change the instance secret in accumulo by using:
bin/accumulo org.apache.accumulo.server.util.ChangeSecret oldPassword newPassword.
You will also need to edit your secret in your configuration file by adding the property instance.secret to your conf/accumulo-site.xml. Without this accumulo will not operate correctlyInstance name : accumulo-demo
Enter initial password for root (this may not be applicable for your security setup): ***********
Confirm initial password for root: ***********
2014-03-04 00:41:11,785 [Configuration.deprecation] INFO : dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
2014-03-04 00:41:12,239 [Configuration.deprecation] INFO : dfs.block.size is deprecated. Instead, use dfs.blocksize
2014-03-04 00:41:15,884 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthorizor
2014-03-04 00:41:15,887 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthenticator
2014-03-04 00:41:15,890 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKPermHandler
Now you are ready to start accumulo!
[hduser@localhost ~]$ /usr/local/accumulo/bin/start-all.sh
WARN : Using Zookeeper /usr/local/zookeeper/zookeeper-3.4.6. Use version 3.3.0 or greater to avoid zookeeper deadlock bug.
Starting monitor on localhost
WARN : Max files open on localhost is 1024, recommend 65536
Starting tablet servers .... done
Starting tablet server on localhost
WARN : Max files open on localhost is 1024, recommend 65536
OpenJDK 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.2.0/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2014-03-04 00:42:29,070 [util.NativeCodeLoader] WARN : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-03-04 00:42:31,922 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on system reset or power loss
2014-03-04 00:42:31,922 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on system reset or power loss
2014-03-04 00:42:31,934 [server.Accumulo] INFO : Attempting to talk to zookeeper
2014-03-04 00:42:33,444 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping to talk to HDFS
2014-03-04 00:42:33,654 [server.Accumulo] INFO : Connected to HDFS
Starting master on localhost
WARN : Max files open on localhost is 1024, recommend 65536
Starting garbage collector on localhost
WARN : Max files open on localhost is 1024, recommend 65536
Starting tracer on localhost
WARN : Max files open on localhost is 1024, recommend 65536

To confirm that it is up and running, view the Accumulo Overview page at: http://localhost:50095

I had ThriftSecurityException(user:root, code:BAD_CREDENTIALS) error under Recent Logs and I fixed it by providing root password in accumulo-site.xml trace.token.property.password. By default it is set to “secret”. However, it is best to make a special user and password for tracing and configure it.

To finish off, let’s create ACCUMULO_HOME like we did for ZooKeeper, Java, etc.

$ sudo vim /etc/profile.d/accumulo.sh

export ACCUMULO_HOME=/usr/local/accumulo/

Sunday, March 16, 2014

Building your first Accumulo tar file

The goal of this tutorial is for you to be able to run Accumulo in your environment and to be able to contribute to it. In order to do that you need to know how to build Accumulo tar file from the source code. So let's get to it:

Create a directory where you want to store Accumulo source code and use git clone command to pull the latest and greatest. For example,
$ sudo mkdir /opt/accumulo/source
$ git clone https://git-wip-us.apache.org/repos/asf/accumulo.git
(please refer to this page for up to date link to the Accumulo git repository, under Source Code)

Navigate to Accumulo source code directory and review its contents files via your favorite IDE.

When you are ready to build your Accumulo tar file, execute following commands

$ cd path/to/accumulo
$ mvn package -P assemble

Depending on your machine it can take some time, but eventually you shall see something like this if everything went well.

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Accumulo ................................... SUCCESS [01:00 min]
[INFO] Trace ............................................. SUCCESS [ 16.735 s]
[INFO] Fate .............................................. SUCCESS [ 5.040 s]
[INFO] Start ............................................. SUCCESS [01:47 min]
[INFO] Core .............................................. SUCCESS [02:03 min]
[INFO] Simple Examples ................................... SUCCESS [ 17.305 s]
[INFO] Server Base ....................................... SUCCESS [ 16.004 s]
[INFO] GC Server ......................................... SUCCESS [ 3.220 s]
[INFO] Master Server ..................................... SUCCESS [ 7.238 s]
[INFO] Tablet Server ..................................... SUCCESS [ 17.611 s]
[INFO] MiniCluster ....................................... SUCCESS [01:38 min]
[INFO] Monitor Server .................................... SUCCESS [ 4.857 s]
[INFO] Native Libraries .................................. SUCCESS [ 13.424 s]
[INFO] Tracer Server ..................................... SUCCESS [ 1.992 s]
[INFO] Accumulo Maven Plugin ............................. SUCCESS [ 25.701 s]
[INFO] Testing ........................................... SUCCESS [01:03 min]
[INFO] Proxy ............................................. SUCCESS [ 28.249 s]
[INFO] Assemblies ........................................ SUCCESS [ 11.505 s]
[INFO] Documentation ..................................... SUCCESS [ 0.126 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:26 min
[INFO] Finished at: 2014-03-01T02:40:37-05:00
[INFO] Final Memory: 95M/276M
[INFO] ------------------------------------------------------------------------

Congrats! You just built your first Accumulo tar file that is ready to be deployed! Navigate to assemble/target/ directory to find your file. It should look something like this: accumulo-1.6.0-bin.tar.gz

Now let's see how we can pull code from different branches and see if there were any updates. Switch to you accumulo directory and check if you have anything new to commit

[root@localhost accumulo]# git status
# On branch master
nothing to commit (working directory clean)

Now let's see if there were any new updates to the code by someone else since your last check out

[root@localhost accumulo]# git pull
Already up-to-date.

Seems like everything is up to date... If you experience issues with git pull command, make sure that remote site is linked properly so git knows where to pull the data from

[root@localhost accumulo]# git remote add origin https://git-wip-us.apache.org/repos/asf/accumulo.git

Now, you probably don't want to make your changes against master branch... In order to switch branches execute

[root@localhost accumulo]# git pull origin 1.6.0-SNAPSHOT

This should give you 1.6.0 development branch of Accumulo

To find out entire list of available ranches

[root@localhost assemble]# git branch -a
1.6.0-SNAPSHOT
* master
remotes/origin/1.4.5-SNAPSHOT
remotes/origin/1.5.2-SNAPSHOT
remotes/origin/1.6.0-SNAPSHOT
remotes/origin/ACCUMULO-1000
remotes/origin/ACCUMULO-1409
remotes/origin/ACCUMULO-1566
remotes/origin/ACCUMULO-2061
remotes/origin/ACCUMULO-2442
remotes/origin/ACCUMULO-578
remotes/origin/ACCUMULO-652
remotes/origin/ACCUMULO-672
remotes/origin/ACCUMULO-722
remotes/origin/ACCUMULO-802
remotes/origin/ACCUMULO-CURATOR
remotes/origin/HEAD -> origin/master
remotes/origin/master
-a shows all local and remote branches.

To create your own branch and switched to it at the same time, execute

[root@localhost assemble]# git checkout -b 1.6.0-SNAPSHOT-myown

This will come in handy when we are ready to contribute, which involve switching between different branches, creating our own feature branches and then committing it back.