Apache Cassandra Secondary Indices

How are Secondary Indices really stored ?

This is based on the article from Datastax found here; https://www.datastax.com/blog/2016/04/cassandra-native-secondary-index-deep-dive

Let’s just create a simple table

Or visualized as a table :

ColumnTypeKey
idintPrimary Key
citytext
nametext

If we then create an index like this

Then this will result in just “normal” table, just hidden , and here the column we created the index for becomes the Partition Key, and the original table Partition Key becomes the clustering key

ColumnTypeKey
citytextPrimary Key
idintClustering Key

With some data it would be like this for the “customer” table.

IdNameCity
1Italia PizzeriaKalmar
2Thai SilkKalmar
3Royal ThaiStockholm
4Indian CornerMalmö

And the index which then is a “table” would thus be like this

CityId
Kalmar1
Kalmar2
Stockholm3
Malmö4

When a cluster is used, the index then the data of the source table is distributed over the nodes, using the murmor3 algorithm. Now the index table is also distributed, BUT together on the same node with the data of the source table.

Posted in Cassandra | Leave a comment

Print stacktraces for all threads on shutdown

If your microservice stops responding from time to time, and they only way out is to kill it with SIGINT or SIGTERM then adding a shutdown hook might be the way to go. Do note that this will not work if you kill the process with SIGKILL (-9), cause that will result in an unclean shutdown.

Some of this code is heavily influenced by Print all of the thread’s information and stack traces : Exception « Development « Java Tutorial. But has been translated into Scala, and cleaned up a little.

The output would look something like this

 

Posted in debugging, JAVA, JVM, Scala | Leave a comment

Apache Zeppelin, with Spark and Cassandra, the perfect tool

Zeppelin has become one of my favourite tools in my toolbox. I am heavily designing stuff for Cassandra and in Scala, and even though I love Cassandra there are times when things just gets so complicated with the CQL command line, and creating a small project in IntelliJ just seems like too much hazel. Then using Zeppelin to try out is just perfect. So this page is a How-To with some useful Cookbook recipes.

Setting Up Zeppelin

I use Docker where things are so much easier, and I pick v0.8.0 cause I never got 0.8.2 to work for some reason.

Download and Start Cassandra

 

Download and Start Zeppelin

Download Zeppelin image

Start Zeppelin on port 8080

-p hp:cp
hp = Host Port, the port on your local machine
cp = Container Port, the port inside the docker which is what Zeppelin is exposing

Go to localhost:8080 in your web browser and you should see something like this

Setup Zeppelin

Find out the IP address of Cassandra in you Docker network, as you can see of the inspect, the IP address is 172.17.0.3.

 

Set up IP address for Cassandra in the Spark Interpreter

Go to the section on “Spark”

Now add a row that says

Now also edit the Dependencies

You can do this in many ways, either you specify the MAVEN repo with version OR you download the JAR file(s) to disk and copy them into the Docker. I had to do the latter due to some issue with my network.

You need these two libraries :

Simply click on the JAR file and download the file, then copy it into the docker with

Setup IP address for Cassandra in Cassandra Interpreter

Create your first Notebook