All posts by tobias

Functions as Arguments Java vs Scala, Game Set Match Scala Wins!

This is how you would create a function that takes a function as argument in Java

import java.util.function.Function;

class Scratch {

    public static void doCallFunc(int num, Function<Integer,String> fn) {
        System.out.println( "Result : "+fn.apply( num ) );
    }

    public static void main(String[] args) {
        Function<Integer,String> myFunc = num -> "Value = " + num;
        System.out.println( myFunc.apply( 7 ) );        
    }
}

The Function<A,B> myFunc = num -> “Value = ” + num;
Here :
A = the type of the first argument, in this example an interger
B = the type of the result/returned value, in this exampel a String

And for multiple parameters you need to create an interface like this

import java.util.function.Function;


@FunctionalInterface
interface TwoParamFunction<A,B,C> {
    public C apply(A a, B b);
}

class Scratch {

    public static void doCallFunc2(int num, TwoParamFunction<Integer,String,String> fn) {
        System.out.println( "Result : "+fn.apply( num, "Value" ) );
    }

    public static void main(String[] args) {
        TwoParamFunction<Integer,String,String> myFunc2 = (num,str) -> str + " : " + num;
        doCallFunc2( 7, myFunc2 );

    }
}

Now with TWO (2) parameters instead, it looks looks alot more complicated.
TwoParamFunction<A,B,C>
A = is the type of the first parameter
B = is the type of the second parameter
C = is the type of the resule/returned value

If we look at Scala the code looks alot simpler and much more intuitive

def myFunc( num:Int ):String = {
 "Value = " + num
}

def doCallFunc( num:Int, fn:(Int)=>String ):Unit = {
 println("Result :"+fn(num))
}

doCallFunc(123,myFunc)

here the definition of the function
fn:(Int)=>String
clearly spells out that the first argument is an Int and the return type is a String.

And if we in Scala would have 2 or more arguments you have probably already guessed it

def myFunc2( num:Int, str:String ):String = {
  str + num
}

def doCallFunc2( num:Int, fn:(Int,String)=>String ):Unit = {
  println("Result :"+fn(num,"Value = "))
}

doCallFunc2( 123, myFunc2 )

For functions as arguments/parameters example above, then Scala wins all week !

Over and out !

Apache Cassandra Secondary Indices

How are Secondary Indices really stored ?

This is based on the article from Datastax found here; https://www.datastax.com/blog/2016/04/cassandra-native-secondary-index-deep-dive

Let’s just create a simple table

Or visualized as a table :

ColumnTypeKey
idintPrimary Key
citytext
nametext

If we then create an index like this

Then this will result in just “normal” table, just hidden , and here the column we created the index for becomes the Partition Key, and the original table Partition Key becomes the clustering key

ColumnTypeKey
citytextPrimary Key
idintClustering Key

With some data it would be like this for the “customer” table.

IdNameCity
1Italia PizzeriaKalmar
2Thai SilkKalmar
3Royal ThaiStockholm
4Indian CornerMalmö

And the index which then is a “table” would thus be like this

CityId
Kalmar1
Kalmar2
Stockholm3
Malmö4

When a cluster is used, the index then the data of the source table is distributed over the nodes, using the murmor3 algorithm. Now the index table is also distributed, BUT together on the same node with the data of the source table.

Print stacktraces for all threads on shutdown

If your microservice stops responding from time to time, and they only way out is to kill it with SIGINT or SIGTERM then adding a shutdown hook might be the way to go. Do note that this will not work if you kill the process with SIGKILL (-9), cause that will result in an unclean shutdown.

Some of this code is heavily influenced by Print all of the thread’s information and stack traces : Exception « Development « Java Tutorial. But has been translated into Scala, and cleaned up a little.

The output would look something like this

 

Apache Zeppelin, with Spark and Cassandra, the perfect tool

Zeppelin has become one of my favourite tools in my toolbox. I am heavily designing stuff for Cassandra and in Scala, and even though I love Cassandra there are times when things just gets so complicated with the CQL command line, and creating a small project in IntelliJ just seems like too much hazel. Then using Zeppelin to try out is just perfect. So this page is a How-To with some useful Cookbook recipes.

Setting Up Zeppelin

I use Docker where things are so much easier, and I pick v0.8.0 cause I never got 0.8.2 to work for some reason.

Download and Start Cassandra

 

Download and Start Zeppelin

Download Zeppelin image

Start Zeppelin on port 8080

-p hp:cp
hp = Host Port, the port on your local machine
cp = Container Port, the port inside the docker which is what Zeppelin is exposing

Go to localhost:8080 in your web browser and you should see something like this

Setup Zeppelin

Find out the IP address of Cassandra in you Docker network, as you can see of the inspect, the IP address is 172.17.0.3.

 

Set up IP address for Cassandra in the Spark Interpreter

Go to the section on “Spark”

Now add a row that says

Now also edit the Dependencies

You can do this in many ways, either you specify the MAVEN repo with version OR you download the JAR file(s) to disk and copy them into the Docker. I had to do the latter due to some issue with my network.

You need these two libraries :

Simply click on the JAR file and download the file, then copy it into the docker with

Setup IP address for Cassandra in Cassandra Interpreter

Create your first Notebook

Cookbook Recipes

Load Table into RDD and count rows

This is just to show how you load a table into an RDD, once in the RDD you can play around with it and do lots of stuff.

Show key spaces using the built in Cassandra interpreter using CQL

The result :

Create Keyspace and Table using CQL

Insert data by hand using CQL

Fill the table with bogus data using Spark and Scala

 

Select data using CQL

Create VIEW so that we can run SQL

 

Run SQL, ohh sweet SQL 🙂

By creating temporary views like this, we can also do joins if we would like to.

Obviously this is not how Cassandra was intended to be used, but the point here is more of giving the ability to troubleshoot, turist around in the data with ease instead of setting up a project, and do the joins inside of the code. Here we are able to really trail and error until we get what we want.

That was all for now

-Tobias

Remove the cardo-updater agent from OSX

I have the intercom from Cardo Systems, and it is really good
BUT when I updated the firmware some time agoe, it decided to install some software that  takes port 8080, which is one of those really common ports used by a lot of applications out there. So it really becomes a problem…

Now I figured this out after using lsof

Then I got the PID, so now I could do a ps -ef, to figure out WHICH parent process started it.

Ohh PPID = 1 🙂 That is the launchd-process

 

OK so now we know it is the launchd process.
So first just find it in launchd

Now in order to unload it we need to find the path to the plist file

 

As you can see above the plist file is :

path = /Library/LaunchDaemons/com.cardosystems.cardo-updater.plist

Allright, so now we can unload it

And that is it !

-Tobias

SQL LIKE operation in Cassandra, is possible in v3.4+

For a long time it has not been possible to do a SELECT * FROM table WHERE firstname like ‘t%’; in Cassandra like you could in eg.. MySQL or any other Relation Database for that matter.

In Cassandra v3.4 this is now possible, BUT it requires some extra to do it right, and that is why I created this blog post cause I had trouble finding it.

The solution is to create a separate index, and not the secondary indexes that Cassandra came with, but a different index, called a SASI index.

This is what I have

And the content of it looks like this

And now I would like to search for all the rows that has a first name that starts with a ‘t’

In SQL that would have been :

SELECT * FROM bth.employee WHERE firstname LIKE ‘t%’;

In fact we could have done that on any column …. but in Cassandra it would result in something like this:

In Cassandra we first has to decide on which columns this should be possible, by creating an index like this:

And so you can now do the following

But what if you decide that I would like to know all the employees that ends with an ‘s’ in their name, so something like this:

So to be able to search for something that contains we have to change the index like this instead:

And now you can run that query again:

You can read more about the SASI index here https://docs.datastax.com/en/cql/3.3/cql/cql_reference/refCreateSASIIndex.html

Enjoy!

-Tobias

UDF/User Defined Functions in Cassandra 3.x

I was just playing around with Cassandra WRITETIME and thought it was somewhat difficult to figure out the date / timestamp of a number like this (microseconds since EPOC) 1470645914253000.

So in my example it looked like this

So I figured why not create a UDF that would solve this for me

That turned out to be a little bit of a challenge …

I thought that I could do like this

BUT NO, YOU CAN NOT!!!

There are several WRONGS in here it turns out

  1. First off you have to turn on
    enable_user_defined_functions: true
    in the conf/cassandra.yaml file
  2. All classes has to be fully qualified, so Date would be java.util.Date, and so on…
  3. The division operator ‘/’ can not be used !!! however +,- and * works fine. surely this must be a bug … this called for some thinking…

The error I got when trying to use the code above without fully qualified names was

And the reason, if I got it right, is that you can not do imports.

The error I got when trying to use the division ‘/’ operator was this:

The code that works looks like this, using java.math.BigDecimal to solve it was perhaps a so-so solution, but it works:

So now my output in cqlsh.sh looks like this now

That is a lot better !

Cassandra set the writetime explicitly with a PreparedStatement

This is a quick one, I wanted to set the writetime of a row explicitly when I populate the database for testing purposes. We use the writetime of a column to filter them out.

It required some looking around to find out how to do this…. so I figured I write an article about it.

The timestamp will be set for ALL cells in this row (well not the primary key, cause it does not have a timestamp, but the others).

The timestamp is given as milliseconds since EPOC, so lots of digits :-).

A prepared statement would then look like this (Scala code)

TTL and TIMESTAMP can both be set like this, i.e. with [ttl] and [timestamp]

-Tobias

Apache SPARK and Cassandra and SQL

This is a short intro to start using Apache SPARK with Cassandra, running SQL on the Cassandra tables.

Note that I am not running a SPARK cluster, I am running “local”, to me this is really convenient, not having to run a SPARK server and workers for something so small. So for playing around with SPARK and Cassandra this is really good.

I am using Scala and SBT.

Something I was struggling hard with, to get the dependency versions right. It is crucial that you do not do like I did first, use version 1.5.2 of Spark and 1.5.0 for SparkCassandraConnector, this will NOT work. I constantly got exception with java.lang.NoSuchMethodException, so incredibly frustrating to try out version after version.

build.sbt

A small Scala program to show how it works

SparkTest.scala

The output…

 

 

SBT Good to know…

Dependecy problems

I have been having some difficulties figuring out what depends on what. I found the following set plugins which I think can be really helpful;

https://github.com/jrudolph/sbt-dependency-graph

and

https://github.com/gilt/sbt-dependency-graph-sugar

Be sure to install GraphWiz first, I used Homebrew on my Mac

brew install graphviz

and I also had to create a config file

with the following content

The readme explains how to use it pretty well, simply start sbt CLI

It will give you a graph that looks something like this (it is in SVG format so it is searchable!!!) Now you should see which package/jar is using which, and also where the different versions clash…dependency-graph

 

Show the class path for the run command