Qlikview and Google BigQuery – Data Visualization for Big Data


Introduction

Google have launched its BigQuery cloud service in May to support interactive analysis of massive datasets up to billions of rows. Shortly after this launch Qliktech, one of the market leaders in BI solutions who is known for its unique associative architecture based on colunm store, in-memory database demonstrated a Qlikview Google BigQuery application that provided data visualization using BigQuery as backend. This post is about how Qlikview and Google BigQuery can be intagrated to provide easy-to-use data analytics application for business users who work on large datasets.

Qlikview and Google BigQuery

Qlikview has two capabilities depending on the needs and the volume of the data:

Qlikview BigQuery Connector: this add-on is written in .NET – thus it requires Microsoft .NET 4 framework to be installed on your computer -, it loads the data into the in-memory data model and various view types (table, barchart, etc) can be then created on the fly to visualize the data or its subset.

Qlikview BigQuery Extension Object: in case of a huge volume of data not all the data can be loaded into memory. Qlikview BigQuery Extension Object provides a web-based solution, it is built upon Google Javascript API. Users can navigate using the extension object and get only the relevant portion of the data from BigQuery.

Preparing the dataset in Google BigQuery

Before we start working with Qlikview BigQuery solutions, we need to create a dataset in Google  BigQuery. We are going to use Apple marketdata donwloaded from finance.yahoo.com site in csv format.

First we need to create a dataset called apple using Google BigQuery browser tool:

Then we need to create a table called marketdata:

The next step is to upload the csv file into the  table – you need to use Chrome browser, as Internet Explorer does not work for file upload as of the writing of this post. The schema that was used is {date:string, open:float, high:float, low:float, close:float, volume:integer, adjclose: float}- just to demonstrate the Google BigQuery is capable of handling various data types:

Finally we can run a simple SQL query to validate that the data has been successfully uploaded:

So far so good, we have the data loaded into Google BigQuery

Qlikview BigQuery Connector

We need to have Qlikview installed, in my test I used Qlikview Personal Edition that can be dowloaded for free from Qliktech website. Then we need to download Qlikview BigQuery Connector  from Qlikview market.

Once Qlikview BigQuery Connector is installed, it appears in a similar way as any other connectors (just like ODBC or OLE DB). Go to Edit Script and then choose BigQuery as database:

Once we click on Connect, an authorization window pops up on the screen – Google BigQuery relies on OAuth2.0, thus we need to have OAUth2.0 client id and client secret. The client id and client secret can be created using Google API console. Select ‘Installed application’ and ‘Other’ options.

In Qlikview we need to authenticate ourself using the client id and client secret:

After authentication the next step is to define the Select statement that will be used to load the data from Google BigQuery into memory:

When we click on OK button the data is being fetched into Qlikview in-memory data model (in our case it is 7,000+ lines):

We can start processing and visualizing the data within Qlikview. First we are going to create a table view by right click and then selecting New Object Sheet:

Let us then define another visualization object, a Line Chart:

And then a BarChart – so we will get the following dashboard to present the data that was loaded from Google BigQuery backend into Qlikview in-memory column store:

Qlikview BigQuery Extension Object

As said before, not necessarily all the data can  fit into the memory – even if Qlikview is very strong at compressing data, we are talking about massive datasets, aren’t we – that is what big data is all about. In this case Qlikview BigQuery Extension Object comes to the rescue. We need to download it from Qlikview market and install it.

As it is a web-based solution using Javascript (Google Javascript API), we’d better have a Google client id and client secret for web applications, we can create it in the same way as described above for Qlikview connector. The ‘javascript origins’ attribute needs to be modified to http://qlikview.

Then we need to turn on WebView in Qlikview:

Now we are ready to create a new visualization object by right click and selecting New Sheet Object:

Select Extenstion Objects, BigQuery and drag it onto the Qlikview window

We need to define the project, dataset and table – bighadoop, apple, marketdata respectively:

Then we have to define the visualization type (Table in this case) and the select statement  to fetch the data. Please, note that we limited the data to 100 lines using ‘select date, open, close, high, low from apple.marketdata limit 100; SQL statement:

We can define various visualization objects, similarly to the BigQuery Connector scenario:

Qlikview on Mobile

QlikTech promotes a unified approach for delivering BI solutions for different platforms based on HTML5, no need for additional layers to support data analytics and visualization on mobile devices. Qlikview Server is capable of recognizing mobile browsers and supports touch-screen functionalities.

Reference:  http://www.slidshare.net/QlikView_UK/qlikview-on-mobile 

Conclusion

Qlikview has gained significant popularity among BI tools, Gartner positioned QlikTech in the  leaders zone of the Business Inteligent Platforms Magic Quadrant in 2012. It provides highly interactive, easy-to-use graphical user interface for business users and the technology partnership with Google to provide seamless integration with BigQuery can just  further strengthen its position.

Heroku and Cassandra – Cassandra.io RESTful APIs


Introduction

Last time I wrote about Hadoop on Heroku which is on add-on from Treasure Data  – this time I am going to cover NoSQL on Heroku.
There are various datastore services – add-ons in Heroku terms – available from MongoDB (MongoHQ) to CouchDB (Cloudant) to Cassandra (Cassandra.io). This post is devoted to Cassandra.io.

Cassandra.io

Cassandra.io is a hosted and managed Cassandra ring based on Apache Cassandra and makes it accessible via RESTful API. As of writing this article, the Cassandra.io client helper libraries are available in Java, Ruby and PHP, and there is also a Objective-C version in private beta. The libraries can be downloaded from github. I use the Java library in my tests.

Heroku – and Cassandra.io add-on, too – is built on Amazon Elastic Compute Cloud (EC2) and it is supported in all Amazon’s locations. Note: Cassandra.io add-on is in public beta now that means you have only one option called Test available – this is free.

Installing Cassandra.io add-on

To install Cassandra.io add-on you just need to follow the standard way of adding an add-on to an application:

$ heroku login
Enter your Heroku credentials.
Email: address@example.com
Password (typing will be hidden): 
Authentication successful.

$ heroku create
Creating glacial-badlands-1234... done, stack is cedar
http://glacial-badlands-1234.herokuapp.com/ | git@heroku.com:glacial-badlands-1234.git

$ heroku addons:add cassandraio:test --app glacial-badlands-1234
Adding cassandraio:test on glacial-badlands-1234... done, v2 (free)
Use `heroku addons:docs cassandraio:test` to view documentation.

You can check the configuration via Heroku admin console:

Then you need to clone the client helper libraries from github:

$ git clone https://github.com/m2mIO/cassandraio-client-libraries.git

In case of Java client library, you need Google gson library (gson-2.0.jar), too.

Writing Cassandra.io application

The java RESTful API library has one simple configuration  file called sdk.properties. It has very few parameters stored in it – the API url and the version. The original sdk.properties file that is cloned from github has the version wrong (v0.1), it needs to be changed to 1.  You can verify the required configuration parameters using heroku config command.

$ heroku config --app glacial-badlands-1234
=== glacial-badlands-1234 Config Vars
CASSANDRAIO_URL: https://Token:AccountId@api.cassandra.io/1/welcome

You can check the same config parameters from Heroku admin console, thought the URL is misleading:

The sdk.properties file should look like this:

apiUrl = https://api.cassandra.io
version = 1

The Java code – CassandraIOTest.java – is like this:

package io.cassandra.tests;

import java.util.ArrayList;
import java.util.List;

import io.cassandra.sdk.StatusMessageModel;
import io.cassandra.sdk.column.ColumnAPI;
import io.cassandra.sdk.columnfamily.ColumnFamilyAPI;
import io.cassandra.sdk.constants.APIConstants;
import io.cassandra.sdk.data.DataAPI;
import io.cassandra.sdk.data.DataBulkModel;
import io.cassandra.sdk.data.DataColumn;
import io.cassandra.sdk.data.DataMapModel;
import io.cassandra.sdk.data.DataRowkey;
import io.cassandra.sdk.keyspace.KeyspaceAPI;

public class CassandraIOTest {
    // credentials
	private static String TOKEN = "<Token>";
	private static String ACCOUNTID = "<AccountId>";

	// data 
	private static String KS = "AAPL";
	private static String CF = "MarketData";
	private static String COL1 = "Open";
	private static String COL2 = "Close";
	private static String COL3 = "High";
	private static String COL4 = "Low";
	private static String COL5 = "Volume";
	private static String COL6 = "AdjClose";
	private static String RK = "18-05-2012";

	public static void main(String[] args) {	
		try {
			StatusMessageModel sm;

			// Create Keyspace
			KeyspaceAPI keyspaceAPI = new KeyspaceAPI(APIConstants.API_URL, TOKEN, ACCOUNTID);
			sm = keyspaceAPI.createKeyspace(KS);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
				+ sm.getError());

		        // Create ColumnFamily
			ColumnFamilyAPI columnFamilyAPI = new ColumnFamilyAPI(APIConstants.API_URL, TOKEN,
					ACCOUNTID);
			sm = columnFamilyAPI.createColumnFamily(KS, CF,
					APIConstants.COMPARATOR_UTF8);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());

			 // Add Columns (High, Low, Open, Close, Volume, AdjClose)
			ColumnAPI columnAPI = new ColumnAPI(APIConstants.API_URL, TOKEN, ACCOUNTID);
			sm = columnAPI.upsertColumn(KS, CF, COL1,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL2,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL3,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL4,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL5,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
			sm = columnAPI.upsertColumn(KS, CF, COL6,
					APIConstants.COMPARATOR_UTF8, true);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());

			//Add Bulk Data
			DataAPI dataAPI = new DataAPI(APIConstants.API_URL, TOKEN, ACCOUNTID);

			List columns = new ArrayList();
			DataColumn dc = new DataColumn(COL1, "533.96");
			columns.add(dc);
			dc = new DataColumn(COL2, "530.38", 12000);
			columns.add(dc);
			dc = new DataColumn(COL3, "543.41", 12000);
			columns.add(dc);
			dc = new DataColumn(COL4, "522.18", 12000);
			columns.add(dc);
			dc = new DataColumn(COL5, "26125200", 12000);
			columns.add(dc);
			dc = new DataColumn(COL6, "530.12", 12000);
			columns.add(dc);

			List rows = new ArrayList();
			DataRowkey row = new DataRowkey(RK, columns);
			rows.add(row);

			DataBulkModel dataBulk = new DataBulkModel(rows);

			sm = dataAPI.postBulkData(KS, CF, dataBulk);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());

			// Get Data
			DataMapModel dm = dataAPI.getData(KS, CF, RK, 0, null);
			System.out.println(dm.toString());			

			// Delete Keyspace
			sm = keyspaceAPI.deleteKeyspace(KS);
			System.out.println(sm.getMessage() + " | " + sm.getDetail() + " | "
					+ sm.getError());
		}
		catch(Exception e) {
			System.out.println(e.getMessage());
		}

	}
}

The runtime result is:

09-Sep-2012 22:59:18 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/keyspace/AAPL/
Success | Keyspace added successfully. | null
09-Sep-2012 22:59:21 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/columnfamily/AAPL/MarketData/UTF8Type/
Success | MarketData ColumnFamily created successfully | null
09-Sep-2012 22:59:24 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/Open/UTF8Type/?isIndex=true
Failed | Unable to create Column: Open | Cassandra encountered an internal error processing this request: TApplicationError type: 6 message:Internal error processing system_update_column_family
09-Sep-2012 22:59:24 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/Close/UTF8Type/?isIndex=true
Success | Close Column upserted successfully | null
09-Sep-2012 22:59:26 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/High/UTF8Type/?isIndex=true
Success | High Column upserted successfully | null
09-Sep-2012 22:59:27 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/Low/UTF8Type/?isIndex=true
Success | Low Column upserted successfully | null09-Sep-2012 22:59:29 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/Volume/UTF8Type/?isIndex=true

Success | Volume Column upserted successfully | null
09-Sep-2012 22:59:30 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/column/AAPL/MarketData/AdjClose/UTF8Type/?isIndex=true
Success | AdjClose Column upserted successfully | null
Posting JSON: {"rowkeys":[{"rowkey":"18-05-2012","columns":[{"columnname":"Open","columnvalue":"533.96","ttl":0},{"columnname":"Close","columnvalue":"530.38","ttl":12000},{"columnname":"High","columnvalue":"543.41","ttl":12000},{"columnname":"Low","columnvalue":"522.18","ttl":12000},{"columnname":"Volume","columnvalue":"26125200","ttl":12000},{"columnname":"AdjClose","columnvalue":"530.12","ttl":12000}]}]}
09-Sep-2012 22:59:32 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/data/AAPL/MarketData/
Success | Bulk upload successfull. | null
09-Sep-2012 22:59:32 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/data/AAPL/MarketData/18-05-2012/
{Volume=26125200, Open=533.96, Low=522.18, High=543.41, Close=530.38, AdjClose=530.12}
09-Sep-2012 22:59:32 io.cassandra.sdk.CassandraIoSDK constructAPIUrl
INFO: API URL: https://api.cassandra.io/1/keyspace/AAPL/
Success | Keyspace dropped successfully. | null

Analysis

Step 1./ The code creates a keyspace named AAPL using HTTP POST, url: https://api.cassandra.io/1/keyspace/AAPL/
It uses KeySpaceAPI class with Token and AccountId as parameters for the constructor. Token is used as username, while AccountID is the password. (Remember: these attributes can be retrieved using heroku config command or via Heroku Admin console)

Step 2./ Then the code creates a column family called MarketData.It uses ColumnFamilyAPI – with the credentials mentioned above – and the REST url is https://api.cassandra.io/1/columnfamily/AAPL/MarketData/UTF8Type/.

Step 3./ Then the code upserts the coumns called Open, Close, High, Low, Volume and AjdClose. It uses ColumnAPI – same credentials as we already know – and the REST url is https://api.cassandra.io/1/column/AAPL/MarketData/Open/UTF8Type/?isIndex=true where AAPL is the keyspace, MarketData is the column family and Open is the column.

Step 4./ Then the code prepares the data as name/value pairs (Open = “533.96”, Close = “530.38”, etc), defines a rowkey (“18-05-2012”) and the uses DataAPI postBulkData method to upload the data into Cassandra.io. DataAPI credentials are the same as above.

Step 5./ The code then fetches the data using HTTP GET with url: https://api.cassandra.io/1/data/AAPL/MarketData/18-05-2012/. The response is in JSON format: {Volume=26125200, Open=533.96, Low=522.18, High=543.41, Close=530.38, AdjClose=530.12}

Step 6./ Finally the code destroys the keyspace using HTTP DELETE, url: https://api.cassandra.io/1/keyspace/AAPL/.

Summary

If you want  to try out a robust, highly available  Casssandra datastore without any upfront infrastructure investment and with an easy to use API, you can certainly have a closer look at Cassandra.io on Heroku. It takes only a few minutes to start up and the APIs offer a simply REST based data management for Java, Ruby and PHP developers.