Monthly Archives: August 2020

Apache Cassandra Secondary Indices

How are Secondary Indices really stored ?

This is based on the article from Datastax found here; https://www.datastax.com/blog/2016/04/cassandra-native-secondary-index-deep-dive

Let’s just create a simple table

Or visualized as a table :

ColumnTypeKey
idintPrimary Key
citytext
nametext

If we then create an index like this

Then this will result in just “normal” table, just hidden , and here the column we created the index for becomes the Partition Key, and the original table Partition Key becomes the clustering key

ColumnTypeKey
citytextPrimary Key
idintClustering Key

With some data it would be like this for the “customer” table.

IdNameCity
1Italia PizzeriaKalmar
2Thai SilkKalmar
3Royal ThaiStockholm
4Indian CornerMalmö

And the index which then is a “table” would thus be like this

CityId
Kalmar1
Kalmar2
Stockholm3
Malmö4

When a cluster is used, the index then the data of the source table is distributed over the nodes, using the murmor3 algorithm. Now the index table is also distributed, BUT together on the same node with the data of the source table.