Heroku and Cassandra – Cassandra.io RESTful APIs


Introduction

Last time I wrote about Hadoop on Heroku which is on add-on from Treasure Data  – this time I am going to cover NoSQL on Heroku.
There are various datastore services – add-ons in Heroku terms – available from MongoDB (MongoHQ) to CouchDB (Cloudant) to Cassandra (Cassandra.io). This post is devoted to Cassandra.io.

Cassandra.io

Cassandra.io is a hosted and managed Cassandra ring based on Apache Cassandra and makes it accessible via RESTful API. As of writing this article, the Cassandra.io client helper libraries are available in Java, Ruby and PHP, and there is also a Objective-C version in private beta. The libraries can be downloaded from github. I use the Java library in my tests.

Heroku – and Cassandra.io add-on, too – is built on Amazon Elastic Compute Cloud (EC2) and it is supported in all Amazon’s locations. Note: Cassandra.io add-on is in public beta now that means you have only one option called Test available – this is free.

Installing Cassandra.io add-on

To install Cassandra.io add-on you just need to follow the standard way of adding an add-on to an application:

$ heroku login
Enter your Heroku credentials.
Email: address@example.com
Password (typing will be hidden): 
Authentication successful.

$ heroku create
Creating glacial-badlands-1234... done, stack is cedar
http://glacial-badlands-1234.herokuapp.com/ | git@heroku.com:glacial-badlands-1234.git

$ heroku addons:add cassandraio:test --app glacial-badlands-1234
Adding cassandraio:test on glacial-badlands-1234... done, v2 (free)
Use `heroku addons:docs cassandraio:test` to view documentation.

You can check the configuration via Heroku admin console:

Then you need to clone the client helper libraries from github:

$ git clone https://github.com/m2mIO/cassandraio-client-libraries.git

In case of Java client library, you need Google gson library (gson-2.0.jar), too.

Writing Cassandra.io application

The java RESTful API library has one simple configuration  file called sdk.properties. It has very few parameters stored in it – the API url and the version. The original sdk.properties file that is cloned from github has the version wrong (v0.1), it needs to be changed to 1.  You can verify the required configuration parameters using heroku config command.

$ heroku config --app glacial-badlands-1234
=== glacial-badlands-1234 Config Vars
CASSANDRAIO_URL: https://Token:AccountId@api.cassandra.io/1/welcome

You can check the same config parameters from Heroku admin console, thought the URL is misleading:

The sdk.properties file should look like this:

apiUrl = https://api.cassandra.io
version = 1

The Java code – CassandraIOTest.java – is like this:

package io.cassandra.tests;

import java.util.ArrayList;
import java.util.List;

import io.cassandra.sdk.StatusMessageModel;
import io.cassandra.sdk.column.ColumnAPI;
import io.cassandra.sdk.columnfamily.ColumnFamilyAPI;
import io.cassandra.sdk.constants.APIConstants;
import io.cassandra.sdk.data.DataAPI;
import io.cassandra.sdk.data.DataBulkModel;
import io.cassandra.sdk.data.DataColumn;
import io.cassandra.sdk.data.DataMapModel;
import io.cassandra.sdk.data.DataRowkey;
import io.cassandra.sdk.keyspace.KeyspaceAPI;

public class CassandraIOTest {
    // credentials
	private static String TOKEN = "<Token>";
	private static String ACCOUNTID = "<AccountId>";

	// data 
	private static String KS = "AAPL";
	private static String CF = "MarketData";
	private static String COL1 = "Open";
	private static String COL2 = "Close";
	private static String COL3 = "High";
	private static String COL4 = "Low";
	private static String COL5 = "Volume";
	private static String COL6 = "AdjClose";
	private static String RK = "18-05-2012";

	public static void main(String[] args) {	
		try {
			StatusMessageModel sm;

			// Create Keyspace
			KeyspaceAPI keyspaceAPI = new KeyspaceAPI(APIConstants.API_URL, TOKEN, ACCOUNTID);
			sm = keyspaceAPI.createKeyspace(KS);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
				+ sm.getError());

		        // Create ColumnFamily
			ColumnFamilyAPI columnFamilyAPI = new ColumnFamilyAPI(APIConstants.API_URL, TOKEN,
					ACCOUNTID);
			sm = columnFamilyAPI.createColumnFamily(KS, CF,
					APIConstants.COMPARATOR_UTF8);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());

			 // Add Columns (High, Low, Open, Close, Volume, AdjClose)
			ColumnAPI columnAPI = new ColumnAPI(APIConstants.API_URL, TOKEN, ACCOUNTID);
			sm = columnAPI.upsertColumn(KS, CF, COL1,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL2,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL3,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL4,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL5,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL6,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());

			//Add Bulk Data
			DataAPI dataAPI = new DataAPI(APIConstants.API_URL, TOKEN, ACCOUNTID);

			List columns = new ArrayList();
			DataColumn dc = new DataColumn(COL1, "533.96");
			columns.add(dc);
			dc = new DataColumn(COL2, "530.38", 12000);
			columns.add(dc);
			dc = new DataColumn(COL3, "543.41", 12000);
			columns.add(dc);
			dc = new DataColumn(COL4, "522.18", 12000);
			columns.add(dc);
			dc = new DataColumn(COL5, "26125200", 12000);
			columns.add(dc);
			dc = new DataColumn(COL6, "530.12", 12000);
			columns.add(dc);

			List rows = new ArrayList();
			DataRowkey row = new DataRowkey(RK, columns);
			rows.add(row);

			DataBulkModel dataBulk = new DataBulkModel(rows);

			sm = dataAPI.postBulkData(KS, CF, dataBulk);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());

			// Get Data
			DataMapModel dm = dataAPI.getData(KS, CF, RK, 0, null);
			System.out.println(dm.toString());			

			// Delete Keyspace
			sm = keyspaceAPI.deleteKeyspace(KS);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
		}
		catch(Exception e) {
			System.out.println(e.getMessage());
		}

	}
}

The runtime result is:

09-Sep-2012 22:59:18 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/keyspace/AAPL/
Success | Keyspace added successfully. | null
09-Sep-2012 22:59:21 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/columnfamily/AAPL/MarketData/UTF8Type/
Success | MarketData ColumnFamily created successfully | null
09-Sep-2012 22:59:24 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/Open/UTF8Type/?isIndex=true
Failed | Unable to create Column: Open | Cassandra encountered an internal error processing this request: TApplicationError type: 6 message:Internal error processing system_update_column_family
09-Sep-2012 22:59:24 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/Close/UTF8Type/?isIndex=true
Success | Close Column upserted successfully | null
09-Sep-2012 22:59:26 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/High/UTF8Type/?isIndex=true
Success | High Column upserted successfully | null
09-Sep-2012 22:59:27 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/Low/UTF8Type/?isIndex=true
Success | Low Column upserted successfully | null09-Sep-2012 22:59:29 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/Volume/UTF8Type/?isIndex=true

Success | Volume Column upserted successfully | null
09-Sep-2012 22:59:30 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/AdjClose/UTF8Type/?isIndex=true
Success | AdjClose Column upserted successfully | null
Posting JSON: {"rowkeys":[{"rowkey":"18-05-2012","columns":[{"columnname":"Open","columnvalue":"533.96","ttl":0},{"columnname":"Close","columnvalue":"530.38","ttl":12000},{"columnname":"High","columnvalue":"543.41","ttl":12000},{"columnname":"Low","columnvalue":"522.18","ttl":12000},{"columnname":"Volume","columnvalue":"26125200","ttl":12000},{"columnname":"AdjClose","columnvalue":"530.12","ttl":12000}]}]}
09-Sep-2012 22:59:32 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/data/AAPL/MarketData/
Success | Bulk upload successfull. | null
09-Sep-2012 22:59:32 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/data/AAPL/MarketData/18-05-2012/
{Volume=26125200, Open=533.96, Low=522.18, High=543.41, Close=530.38, AdjClose=530.12}
09-Sep-2012 22:59:32 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/keyspace/AAPL/
Success | Keyspace dropped successfully. | null

Analysis

Step 1./ The code creates a keyspace named AAPL using HTTP POST, url: https://api.cassandra.io/1/keyspace/AAPL/
It uses KeySpaceAPI class with Token and AccountId as parameters for the constructor. Token is used as username, while AccountID is the password. (Remember: these attributes can be retrieved using heroku config command or via Heroku Admin console)

Step 2./ Then the code creates a column family called MarketData.It uses ColumnFamilyAPI – with the credentials mentioned above – and the REST url is https://api.cassandra.io/1/columnfamily/AAPL/MarketData/UTF8Type/.

Step 3./ Then the code upserts the coumns called Open, Close, High, Low, Volume and AjdClose. It uses ColumnAPI – same credentials as we already know – and the REST url is https://api.cassandra.io/1/column/AAPL/MarketData/Open/UTF8Type/?isIndex=true where AAPL is the keyspace, MarketData is the column family and Open is the column.

Step 4./ Then the code prepares the data as name/value pairs (Open = “533.96”, Close = “530.38”, etc), defines a rowkey (“18-05-2012”) and the uses DataAPI postBulkData method to upload the data into Cassandra.io. DataAPI credentials are the same as above.

Step 5./ The code then fetches the data using HTTP GET with url: https://api.cassandra.io/1/data/AAPL/MarketData/18-05-2012/. The response is in JSON format: {Volume=26125200, Open=533.96, Low=522.18, High=543.41, Close=530.38, AdjClose=530.12}

Step 6./ Finally the code destroys the keyspace using HTTP DELETE, url: https://api.cassandra.io/1/keyspace/AAPL/.

Summary

If you want  to try out a robust, highly available  Casssandra datastore without any upfront infrastructure investment and with an easy to use API, you can certainly have a closer look at Cassandra.io on Heroku. It takes only a few minutes to start up and the APIs offer a simply REST based data management for Java, Ruby and PHP developers.

Advertisements

Big Data on Heroku – Hadoop from Treasure Data


       

This time I write about Heroku and Treasure Data Hadoop solution – I found it really to be  a gem in the Big Data world.

Heroku is a cloud platform as a service (PaaS) owned by Salesforce.com. Originally it started with supporting Ruby as its main programming language but it has been extended to Java, Scala, Node.js, Python and Clojure, too. It also supports a long list of addons including – among others –  RDBMS and NoSQL capabilities and Hadoop-based data warehouse developed by Treasure Data.

Treasure Data Hadoop Architecture

The architecture of Treasure Data Hadoop solution is as as follows:

Heroku Toolbelt

Heroku toolbelt is a command line tooling that consists of heroku, foreman and git packages. As it is described on heroku toolbelt website: it is “everything you need to get started using heroku”. (heroku CLI is based on ruby so you need ruby under the hood, too). Once you have signed up for heroku (you need a verified account meaning that you provided your bank details for potential service charges) and you have installed the heroku toolbelt, you can start right away.

Depending on you environment – I am using Ubuntu 12.04 LTS – you can use alternative installation method like:

$ sudo apt-get install git
$ gem install heroku
$ gem install foreman

Heroku and Treasure Data add-on

If you want to use Treasure Data on Heroku, you need to add the Treasure Data Hadoop add-on –  you need to login, create an application (heroku will generate a fancy name like boiling-tundra for you) and then you need to add your particular add-on to the application you just created:

$ heroku login
Enter your Heroku credentials.
Email: xxx@mail.com
Password (typing will be hidden): 
Found existing public key: /home/istvan/.ssh/id_dsa.pub
Uploading SSH public key /home/istvan/.ssh/id_dsa.pub... done
Authentication successful.

$ heroku create
Creating boiling-tundra-1234... done, stack is cedar
http://boiling-tundra-1234.herokuapp.com/ | git@heroku.com:boiling-tundra-1234.git

$ heroku addons:add treasure-data:nano --app boiling-tundra-1234
Adding treasure-data:nano on boiling-tundra-1234... done, v2 (free)
Use `heroku addons:docs treasure-data:nano` to view documentation.

I just love the coloring scheme and the graphics used in heroku console, it is simply brilliant.

Treasure Data toolbelt

To manage Treaure Data Hadoop on Heroku you need to install Treasure Data toolbelt – it fits very much to heroku CLI,  it is also based on ruby:

$ gem install td

Then you need to install heroku plugin to support heroku commands:

$ heroku plugins:install https://github.com/treasure-data/heroku-td.git
Installing heroku-td... done

To verify that everything is fine, just run:

$ heroku plugins
=== Installed Plugins
heroku-td

and

$ heroku td
usage: heroku td [options] COMMAND [args]

options:
  -c, --config PATH                path to config file (~/.td/td.conf)
  -k, --apikey KEY                 use this API key instead of reading the config file
  -v, --verbose                    verbose mode
  -h, --help                       show help
...

Treasure Data Hadoop – td commands

Now we are ready to execute td commands from heroku. td commands are used to create database and tables, import data, run queries, drop tables, etc. Under the hood td commands are basically HiveQL queries. (According to their website, Treasure Data plans to support Pig as well in the future).

By default Treasure Data td-agent prefers json formatted data, though they can process various other formats (apache log, syslog, etc) and you can write your own parser to process the  uploaded data.

Thus I converted my AAPL stock data (again thanks to http://finance.yahoo.com) into json format:

{"time":"2012-08-20", "open":"650.01", "high":"665.15", "low":"649.90", "close":"665.15", "volume":"21876300", "adjclose":"665.15"}
{"time":"2012-08-17", "open":"640.00", "high":"648.19", "low":"638.81", "close":"648.11", "volume":"15812900", "adjclose":"648.11"}
{"time":"2012-08-16", "open":"631.21", "high":"636.76", "low":"630.50", "close":"636.34", "volume":"9090500", "adjclose":"634.64"}
{"time":"2012-08-15", "open":"631.30", "high":"634.00", "low":"625.75", "close":"630.83", "volume":"9190800", "adjclose":"630.83"}
{"time":"2012-08-14", "open":"631.87", "high":"638.61", "low":"630.21", "close":"631.69", "volume":"12148900", "adjclose":"631.69"}
{"time":"2012-08-13", "open":"623.39", "high":"630.00", "low":"623.25", "close":"630.00", "volume":"9958300", "adjclose":"630.00"}
{"time":"2012-08-10", "open":"618.71", "high":"621.76", "low":"618.70", "close":"621.70", "volume":"6962100", "adjclose":"621.70"}
{"time":"2012-08-09", "open":"617.85", "high":"621.73", "low":"617.80", "close":"620.73", "volume":"7915800", "adjclose":"620.73"}
{"time":"2012-08-08", "open":"619.39", "high":"623.88", "low":"617.10", "close":"619.86", "volume":"8739500", "adjclose":"617.21"}
{"time":"2012-08-07", "open":"622.77", "high":"625.00", "low":"618.04", "close":"620.91", "volume":"10373100", "adjclose":"618.26"}

The first step is to create the database called aapl:

$ heroku td db:create aapl --app boiling-tundra-1234
 !    DEPRECATED: Heroku::Client#deprecate is deprecated, please use the heroku-api gem.
 !    DEPRECATED: More information available at https://github.com/heroku/heroku.rb
 !    DEPRECATED: Deprecated method called from /usr/local/heroku/lib/heroku/client.rb:129.
Database 'aapl' is created.

Then create the table called marketdata

$ heroku td table:create aapl marketdata --app boiling-tundra-1234
 !    DEPRECATED: Heroku::Client#deprecate is deprecated, please use the heroku-api gem.
 !    DEPRECATED: More information available at https://github.com/heroku/heroku.rb
 !    DEPRECATED: Deprecated method called from /usr/local/heroku/lib/heroku/client.rb:129.
Table 'aapl.marketdata' is created.

Check if the tables has been created successfully:

$ heroku td tables --app boiling-tundra-1234
! DEPRECATED: Heroku::Client#deprecate is deprecated, please use the heroku-api gem.
! DEPRECATED: More information available at https://github.com/heroku/heroku.rb
! DEPRECATED: Deprecated method called from /usr/local/heroku/lib/heroku/client.rb:129.
+----------+------------+------+-------+--------+
| Database | Table | Type | Count | Schema |
+----------+------------+------+-------+--------+
| aapl | marketdata | log | 0 | |
+----------+------------+------+-------+--------+
1 row in set

Import data:

$ heroku td table:import aapl marketdata --format json --time-key time aapl.json --app boiling-tundra-1234
 !    DEPRECATED: Heroku::Client#deprecate is deprecated, please use the heroku-api gem.
 !    DEPRECATED: More information available at https://github.com/heroku/heroku.rb
 !    DEPRECATED: Deprecated method called from /home/istvan/.rvm/gems/ruby-1.9.2-p320/gems/heroku-2.30.3/lib/heroku/client.rb:129.
importing aapl.json...
  uploading 364 bytes...
  imported 10 entries from aapl.json.
done.

Check if the data import was successful – you shoud see count column indicating the number of rows loaded into the table:

$ heroku td tables --app boiling-tundra-1234
 !    DEPRECATED: Heroku::Client#deprecate is deprecated, please use the heroku-api gem.
 !    DEPRECATED: More information available at https://github.com/heroku/heroku.rb
 !    DEPRECATED: Deprecated method called from /home/istvan/.rvm/gems/ruby-1.9.2-p320/gems/heroku-2.30.3/lib/heroku/client.rb:129.
+----------+------------+------+-------+--------+
| Database | Table      | Type | Count | Schema |
+----------+------------+------+-------+--------+
| aapl     | marketdata | log  | 10    |        |
+----------+------------+------+-------+--------+
1 row in set

Now we are ready to run HiveQL (td query) against the dataset – this particular query lists the highest prices of AAPL stock on the top and shows the prices in ascending order. (time value is based on UNIX epoch):

$ heroku td query -d aapl -w "SELECT v['time'] as time, v['high'] as high, v['low'] as low FROM marketdata ORDER BY high DESC" --app boiling-tundra-1234
 !    DEPRECATED: Heroku::Client#deprecate is deprecated, please use the heroku-api gem.
 !    DEPRECATED: More information available at https://github.com/heroku/heroku.rb
 !    DEPRECATED: Deprecated method called from /home/istvan/.rvm/gems/ruby-1.9.2-p320/gems/heroku-2.30.3/lib/heroku/client.rb:129.
Job 757853 is queued.
Use 'heroku td job:show 757853' to show the status.
queued...
  started at 2012-08-21T21:06:54Z
  Hive history file=/mnt/hive/tmp/617/hive_job_log_617_201208212106_269570447.txt
  Total MapReduce jobs = 1
  Launching Job 1 out of 1
  Number of reduce tasks determined at compile time: 1
  In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=
  In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=
  In order to set a constant number of reducers:
    set mapred.reduce.tasks=
  Starting Job = job_201207250829_556135, Tracking URL = http://domU-12-31-39-0A-56-11.compute-1.internal:50030/jobdetails.jsp?jobid=job_201207250829_556135
  Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=10.211.85.219:8021 -kill job_201207250829_556135
  2012-08-21 21:07:21,455 Stage-1 map = 0%,  reduce = 0%
  2012-08-21 21:07:28,480 Stage-1 map = 100%,  reduce = 0%
  2012-08-21 21:07:37,965 Stage-1 map = 100%,  reduce = 100%
  Ended Job = job_201207250829_556135
  OK
  MapReduce time taken: 42.536 seconds
  finished at 2012-08-21T21:07:53Z
  Time taken: 53.781 seconds
Status     : success
Result     :
+------------+--------+--------+
| time        | high   | low   |
+------------+--------+--------+
| 1345417200 | 665.15 | 649.90 |
| 1345158000 | 648.19 | 638.81 |
| 1344898800 | 638.61 | 630.21 |
| 1345071600 | 636.76 | 630.50 |
| 1344985200 | 634.00 | 625.75 |
| 1344812400 | 630.00 | 623.25 |
| 1344294000 | 625.00 | 618.04 |
| 1344380400 | 623.88 | 617.10 |
| 1344553200 | 621.76 | 618.70 |
| 1344466800 | 621.73 | 617.80 |
+------------+--------+--------+
10 rows in set

Finally you can delete the marketdata table:

$ heroku td table:delete aapl marketdata --app boiling-tundra-1234
 !    DEPRECATED: Heroku::Client#deprecate is deprecated, please use the heroku-api gem.
 !    DEPRECATED: More information available at https://github.com/heroku/heroku.rb
 !    DEPRECATED: Deprecated method called from /home/istvan/.rvm/gems/ruby-1.9.2-p320/gems/heroku-2.30.3/lib/heroku/client.rb:129.
Do you really delete 'marketdata' in 'aapl'? [y/N]: y
Table 'aapl.marketdata' is deleted.

More details on how to use Treasure Data Hadoop can be found at http://docs.treasure-data.com/articles/quickstart