Category Archives: Cassandra

Apache Cassandra Secondary Indices

How are Secondary Indices really stored ?

This is based on the article from Datastax found here; https://www.datastax.com/blog/2016/04/cassandra-native-secondary-index-deep-dive

Let’s just create a simple table

Or visualized as a table :

ColumnTypeKey
idintPrimary Key
citytext
nametext

If we then create an index like this

Then this will result in just “normal” table, just hidden , and here the column we created the index for becomes the Partition Key, and the original table Partition Key becomes the clustering key

ColumnTypeKey
citytextPrimary Key
idintClustering Key

With some data it would be like this for the “customer” table.

IdNameCity
1Italia PizzeriaKalmar
2Thai SilkKalmar
3Royal ThaiStockholm
4Indian CornerMalmö

And the index which then is a “table” would thus be like this

CityId
Kalmar1
Kalmar2
Stockholm3
Malmö4

When a cluster is used, the index then the data of the source table is distributed over the nodes, using the murmor3 algorithm. Now the index table is also distributed, BUT together on the same node with the data of the source table.

UDF/User Defined Functions in Cassandra 3.x

I was just playing around with Cassandra WRITETIME and thought it was somewhat difficult to figure out the date / timestamp of a number like this (microseconds since EPOC) 1470645914253000.

So in my example it looked like this

So I figured why not create a UDF that would solve this for me

That turned out to be a little bit of a challenge …

I thought that I could do like this

BUT NO, YOU CAN NOT!!!

There are several WRONGS in here it turns out

  1. First off you have to turn on
    enable_user_defined_functions: true
    in the conf/cassandra.yaml file
  2. All classes has to be fully qualified, so Date would be java.util.Date, and so on…
  3. The division operator ‘/’ can not be used !!! however +,- and * works fine. surely this must be a bug … this called for some thinking…

The error I got when trying to use the code above without fully qualified names was

And the reason, if I got it right, is that you can not do imports.

The error I got when trying to use the division ‘/’ operator was this:

The code that works looks like this, using java.math.BigDecimal to solve it was perhaps a so-so solution, but it works:

So now my output in cqlsh.sh looks like this now

That is a lot better !

Cassandra set the writetime explicitly with a PreparedStatement

This is a quick one, I wanted to set the writetime of a row explicitly when I populate the database for testing purposes. We use the writetime of a column to filter them out.

It required some looking around to find out how to do this…. so I figured I write an article about it.

The timestamp will be set for ALL cells in this row (well not the primary key, cause it does not have a timestamp, but the others).

The timestamp is given as milliseconds since EPOC, so lots of digits :-).

A prepared statement would then look like this (Scala code)

TTL and TIMESTAMP can both be set like this, i.e. with [ttl] and [timestamp]

-Tobias