Monday, November 21, 2016

Smashing Cassandra 4.6 Into Pentaho BA 6.1

I am working on a POC with the Cassandra wide column store and the Pentaho reporting suite. At the time of this POC Pentaho support is behind the DataStax Cassandra release cycle, which is going to 5.0 in a few weeks. The thing I like about Pentaho is that it is designed to allow a huge amount of flexibility. So much it can destroy your mind with the possibilities.

First off I had to change my ETL in Pentaho DI and report in Pentaho BA to use a generic database connector. The URL was the connection string URI to my Cassandra cluster

 and a generic table output step. I changed my report to use the same generic database connection. I had no idea if this would work and at first it failed utterly. But the error indicated a missing JDBC driver. So I started plugging in different JDBC drivers (BigSql, DBSchema, and Datastax). I could get each to connect to my version of Cassandra but they all threw cryptic errors like:
    Codec not found for requested operation: [int <-> java.lang.Long Dbschema]
    Codec not found for requested operation: [timestamp <-> com.datastax.driver.core.LocalDate]

      public Timestamp convertToDatabaseColumn(LocalDate ld) {
        return Timestamp.valueOf(ld);
    }

        @Override
public timestamp format(LocalDate value) {
  if (value == null) {
  return "NULL";
 
  else {
  return convertToDatabaseColumn(value.getMillisSinceEpoch()).toDateTime();
  }
    }

If you want to make a generic database connection with a JDBC connector you would use
Build Command Required: & C:\tools\apache-maven-3.3.3\bin\mvn clean package site install
URL Structure: jdbc:cassandra://host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[keyspace][?options]]
URL Example: jdbc:cassandra://CassandraContactNode1,CassandraContactNode2,CassandraContactNode3/rti?consistency=LOCAL_QUORUM 
Java Driver Class: com.datastax.driver.core.Connection

To find the connection class you need you would open folders for the class path until you found a connection class that allowed you to pass in multiple nodes for your cluster and a consistency level. For a recent Apache update to Cassandra I downloaded the tarball and navigated to 

D:\downloads\apache-cassandra-3.0.7-src.tar.gz\apache-cassandra-3.0.7-src.tar\apache-cassandra-3.0.7-src\src\java\org\apache\cassandra\transport\

In there are a SimpleClient class that accepts a channel and version, a Client class that accepts a host, port, version, and ssl settings,  and a Connection class that accepts a more complex netty channel and the CQL version. 

REFS:
http://opensourceconnections.com/blog/2015/12/22/exploring-custom-typecodecs-in-the-cassandra-java-driver/


No comments:

Post a Comment