Category Archives: Cassandra

Apache Cassandra Secondary Indices

August 16, 2020Cassandratobias

How are Secondary Indices really stored ?

This is based on the article from Datastax found here; https://www.datastax.com/blog/2016/04/cassandra-native-secondary-index-deep-dive

Let’s just create a simple table

CREATE TABLE customer (
    id int PRIMARY KEY,
    city text,
    name text
)

CREATE TABLE customer (

id int PRIMARY KEY,

city text,

name text

)

Or visualized as a table :

Column	Type	Key
id	int	Primary Key
city	text
name	text

If we then create an index like this

CREATE INDEX customer_city_idx ON customer (city);

1	CREATE INDEX customer_city_idx ON customer (city);

Then this will result in just “normal” table, just hidden , and here the column we created the index for becomes the Partition Key, and the original table Partition Key becomes the clustering key

Column	Type	Key
city	text	Primary Key
id	int	Clustering Key

With some data it would be like this for the “customer” table.

Id	Name	City
1	Italia Pizzeria	Kalmar
2	Thai Silk	Kalmar
3	Royal Thai	Stockholm
4	Indian Corner	Malmö

And the index which then is a “table” would thus be like this

City	Id
Kalmar	1
Kalmar	2
Stockholm	3
Malmö	4

When a cluster is used, the index then the data of the source table is distributed over the nodes, using the murmor3 algorithm. Now the index table is also distributed, BUT together on the same node with the data of the source table.

UDF/User Defined Functions in Cassandra 3.x

August 11, 2016Cassandra, JAVAtobias

I was just playing around with Cassandra WRITETIME and thought it was somewhat difficult to figure out the date / timestamp of a number like this (microseconds since EPOC) 1470645914253000.

So in my example it looked like this

cqlsh:bth> select id, writetime(dateofbirth) from bth.employee;

 id | writetime(dateofbirth)
----+------------------------
  1 |       1470645914253000
  2 |       1470645977177000
  7 |       1470948508799001
  3 |       1470645977178000

(4 rows)
cqlsh:bth>

cqlsh:bth> select id, writetime(dateofbirth) from bth.employee;

id | writetime(dateofbirth)

----+------------------------

1 | 1470645914253000

2 | 1470645977177000

7 | 1470948508799001

3 | 1470645977178000

(4 rows)

cqlsh:bth>

So I figured why not create a UDF that would solve this for me

That turned out to be a little bit of a challenge …

I thought that I could do like this

CREATE FUNCTION bth.ts2date ( input bigint )
	RETURNS NULL ON NULL INPUT
    RETURNS text 
    LANGUAGE java
    AS $$
    	if( input > 0L ) {
    		long ms = input / 1000L;
    		Date date=new Date(ms);
    		SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss,SSS");
    		return sdf.format(date);
    	} else return null;
    $$;

CREATE FUNCTION bth.ts2date ( input bigint )

RETURNS NULL ON NULL INPUT

RETURNS text

LANGUAGE java

AS $$

if( input > 0L ) {

long ms = input / 1000L;

Date date=new Date(ms);

SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss,SSS");

return sdf.format(date);

} else return null;

$$;

BUT NO, YOU CAN NOT!!!

There are several WRONGS in here it turns out

First off you have to turn on
enable_user_defined_functions: true
in the conf/cassandra.yaml file
All classes has to be fully qualified, so Date would be java.util.Date, and so on…
The division operator ‘/’ can not be used !!! however +,- and * works fine. surely this must be a bug … this called for some thinking…

The error I got when trying to use the code above without fully qualified names was

cqlsh> CREATE FUNCTION ts2date ( input bigint )
   ... RETURNS NULL ON NULL INPUT
   ...     RETURNS text 
   ...     LANGUAGE java
   ...     AS $$
   ...     if( input != null ) {
   ...     Date date=new Date(mills);
InvalidRequest: code=2200 [Invalid query] message="Functions must be fully qualified with a keyspace name if a keyspace is not set for the session"

cqlsh> CREATE FUNCTION ts2date ( input bigint )

... RETURNS NULL ON NULL INPUT

... RETURNS text

... LANGUAGE java

... AS $$

... if( input != null ) {

... Date date=new Date(mills);

InvalidRequest: code=2200 [Invalid query] message="Functions must be fully qualified with a keyspace name if a keyspace is not set for the session"

And the reason, if I got it right, is that you can not do imports.

The error I got when trying to use the division ‘/’ operator was this:

cqlsh:bth> CREATE FUNCTION bth.ts2date ( input bigint )
       ... RETURNS NULL ON NULL INPUT
       ...     RETURNS text 
       ...     LANGUAGE java
       ...     AS $$
       ...     if( input > 0L ) {
       ...     java.util.Date date=new java.util.Date(input/1000);
Invalid syntax at line 7, char 49
      java.util.Date date=new java.util.Date(input/1000);
                                                  ^
cqlsh:bth>

cqlsh:bth> CREATE FUNCTION bth.ts2date ( input bigint )

... RETURNS NULL ON NULL INPUT

... RETURNS text

... LANGUAGE java

... AS $$

... if( input > 0L ) {

... java.util.Date date=new java.util.Date(input/1000);

Invalid syntax at line 7, char 49

java.util.Date date=new java.util.Date(input/1000);

cqlsh:bth>

The code that works looks like this, using java.math.BigDecimal to solve it was perhaps a so-so solution, but it works:

CREATE FUNCTION bth.ts2date ( input bigint ) 
RETURNS NULL ON NULL INPUT     
RETURNS text      
LANGUAGE java     
AS $$     
	if( input > 0L ) {     
		java.math.BigDecimal t = java.math.BigDecimal.valueOf(1000L);     
		java.math.BigDecimal inp = java.math.BigDecimal.valueOf(input);     
		java.math.BigDecimal mst = inp.divide(t); 
		long ms = mst.longValue();     
		java.util.Date date=new java.util.Date(ms);     
		java.text.SimpleDateFormat sdf = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ss,SSS");     
		return sdf.format(date);     
	} else return null;     
$$;

CREATE FUNCTION bth.ts2date ( input bigint )

RETURNS NULL ON NULL INPUT

RETURNS text

LANGUAGE java

AS $$

if( input > 0L ) {

java.math.BigDecimal t = java.math.BigDecimal.valueOf(1000L);

java.math.BigDecimal inp = java.math.BigDecimal.valueOf(input);

java.math.BigDecimal mst = inp.divide(t);

long ms = mst.longValue();

java.util.Date date=new java.util.Date(ms);

java.text.SimpleDateFormat sdf = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ss,SSS");

return sdf.format(date);

} else return null;

$$;

So now my output in cqlsh.sh looks like this now

cqlsh:bth> select id, ts2date(writetime(dateofbirth)) from bth.employee;

 id | bth.ts2date(writetime(dateofbirth))
----+-------------------------------------
  1 |             2016-08-08 10:45:14,253
  2 |             2016-08-08 10:46:17,177
  7 |             2016-08-11 22:48:28,799
  3 |             2016-08-08 10:46:17,178

(4 rows)
cqlsh:bth>

cqlsh:bth> select id, ts2date(writetime(dateofbirth)) from bth.employee;

id | bth.ts2date(writetime(dateofbirth))

----+-------------------------------------

1 | 2016-08-08 10:45:14,253

2 | 2016-08-08 10:46:17,177

7 | 2016-08-11 22:48:28,799

3 | 2016-08-08 10:46:17,178

(4 rows)

cqlsh:bth>

That is a lot better !

Cassandra set the writetime explicitly with a PreparedStatement

March 16, 2016Cassandra, JAVA, Scalatobias

This is a quick one, I wanted to set the writetime of a row explicitly when I populate the database for testing purposes. We use the writetime of a column to filter them out.

It required some looking around to find out how to do this…. so I figured I write an article about it.

INSERT INTO invoices (invoice_id,amount,tax,description) VALUES ('333',10,3,'ice-cream') USING TIMESTAMP 1458134077121;

1	INSERT INTO invoices (invoice_id,amount,tax,description) VALUES ('333',10,3,'ice-cream') USING TIMESTAMP 1458134077121;

The timestamp will be set for ALL cells in this row (well not the primary key, cause it does not have a timestamp, but the others).

The timestamp is given as millisecondsÂ since EPOC, so lots of digitsÂ :-).

A prepared statement would then look like this (Scala code)

val cql = "INSERT INTO invoices (invoice_id,amount,tax,description) VALUES (?,?,?,?) USING TIMESTAMP ?;"

val stmt = session.prepare( cql )

val bs = stmt.bind()

bs.setString("invoice_id", "333" )

bs.setLong("amount", 10L )

bs.setLong("tax", 3L )

bs.setString("description","ice-cream")

bs.setLong("[timestamp]", 1458134077121L )

val result = Â session.execute( bs )

val cql = "INSERT INTO invoices (invoice_id,amount,tax,description) VALUES (?,?,?,?) USING TIMESTAMP ?;"

val stmt = session.prepare( cql )

val bs = stmt.bind()

bs.setString("invoice_id", "333" )

bs.setLong("amount", 10L )

bs.setLong("tax", 3L )

bs.setString("description","ice-cream")

bs.setLong("[timestamp]", 1458134077121L )

val result = Â session.execute( bs )

TTL and TIMESTAMP can both be set like this, i.e. with [ttl] and [timestamp]

-Tobias

tsoft.se

Tobias – With a Passion For Software Development

Category Archives: Cassandra

Apache Cassandra Secondary Indices

How are Secondary Indices really stored ?

UDF/User Defined Functions in Cassandra 3.x

Cassandra set the writetime explicitly with a PreparedStatement