Cassandra Interview Questions

Cassandra Interview Questions

Nowadays, the industry is developing an interest in using Schema-less databases. For that reason, NoSQL is growing in this sector at a great pace. So, to prepare for your interviews, here we present some interview questions on Cassandra, which is the NoSQL database. Also, if you check the salary trend of NoSQL database developers, it is quite high. So, you can go blindly with this field and start preparing from today onwards. Let’s have a look:

Download Cassandra Interview Questions PDF

Cassandra Interview Questions

It is the work of snitch that determines to which nodes belong. It can belong to data centers and racks. It provides the information to Cassandra about the replication strategy and network topology for replication schemes. There are several examples of snitches, some of these are:

Simple SnitchProperty File SnitchEc2SnitchCloud stack Snitch
Dynamic snitchingRack Inferring SnitchGossiping Property FileGoogle Cloud Snitch

There are many levels which are described below:

  1. All: It includes all levels addition with custom levels.
  2. DEBUG: To debug an application, it designates fine-grained informational events.
  3. WARN: Detects potentially harmful conditions.
  4. INFO: Indicates informational messages to how the progress.
  5. ERROR: Specifies error events

These are some question which will help you to crack your interview. Of Course, you should also prepare well in this field to get a highly payable job.

It is in the logging directory where logs are written to the system.log and debug.log file. It is the simplest way to check what’s happening in the database just by changing the logging level. We can configure it by programmatically or by manually.

    • Single primary key

In this case, only one column is used as a primary key. This column is also referred to as partitioning key which is used to partition the data. By virtue of the partition key, data has been spread on various nodes.

    • Compound Primary Key

In this, the data is partitioned and then grouped. race_name is referred to as partitioning key while the race_position is referred to as clustering key. Former decides the partition of data and the latter decides the clustering of data.

Murmur3 Partitioner: It is the default and the most important partitioner as it is better and well performed than the others. Its speed is more than Random Partitioner. With all of this, it is also functional for even distribution. It uses 64-bit hash values with Range: 263 to 263-1

Random Partitioner: Before the arrival of Cassandra 1.2, Random Partitioner was identified as the default. It is worked together with vnodes. As same as above, it is also functional as for even distribution. MD5 hash values partition key with Range: 0 to 2127-1

Byte Ordered Partitioner: Byte Ordered Partitioner is a system that are beneficially to organize the location of the keys in the Cassandra. raw byte array value in Byte ordered Partitioner of the row key checks and make the decision regarding the storage of rows on the nodes.

Yes, it is possible to add or delete Column Families in a working group but before doing it, there has some precaution or procedure that the client has to follow. These precautions are: -

  • Very first, users must assure that the commit log is clear and it can be done by 'node tool drain'.
  • No data should be left in the commit log. For this Cassandra has to be turned off.
  • Lastly, it is vital to delete the SS Table files for the raised CFs.
  • All
  • It is extremely consistent. It is compulsory to a write needs to be written to memtable and commit log which is on copy nodes in the group
  • EACH_QUORUM
  • It is compulsory for a write needs to be written to memtable and commit log on quorum which exists on copy nodes in all data centers
  • LOCAL_QUORUM
  • It is compulsory for a write needs to be written to memtable and commit log on the quorum of copy nodes but only in the same center.
  • ONE
  • It is compulsory for a write needs to be written to memtable and commit log with one or more replica node.
  • TWO
  • It is compulsory for a write needs to be written to memtable and commit log with one or more replica node.
  • THREE
  • Same as the above but it should be with three replica nodes, sequentially

The data stored in Cassandra is in bytes. When the user or client is sure about the approver, then these bytes are encoded by the Cassandra according to the need. After the completion, a comparator orders the encoding based on the column.

Composites have a particular coding and are patterned in bytes. For each and every component there is always a storage of two-byte length and it is supported by the byte-encoded element which is further accompanied by a termination bit.

As the name suggests, Memtable is related to memory. The data that is written is in a structure (in-memory) by Cassandra is termed as Memtable. All the content that is stored as key/column takes place in these structures. With the use of the key, it is easy to classify the data. For every Column Family, there is a definite Memtable and it is also useful at the time of regaining the column data from the key.

Cassandra- CQL collections serve the clients to reserve a large number of values just in one variable. There are many ways to use the CQL collection in Cassandra. These are: -

  • List- In arranging and managing the system of the data, a list is used. Moreover, it is also useful to store the value numerous times.
  • SET- In order to keep and returned the group of elements in classified orders, SET is used. MAP- MAP is used to keep the key-value set of components.

CAP is efficiently used at the time of handling and managing the scaling tactics. Whenever a desire of scaling is observed, CAP theorem play its vital role. CAP Theory stands for Consistency Availability and Partition tolerance theory which states that in the system same as Cassandra users cannot use all the three characteristics, they have to choose two of them and one is needed to sacrifice.

These three characteristics are: -

  • Consistency: It gives the warranty for returning of recent write for the user.
  • Availability-: It is a source of giving a reasonable reply within minimum time.
  • Partition: It represents that the system will work also at the time when the network barrier or partitions occur.

SS Table stands for Sorted String Table which indicates the presence of an important file in Cassandra and it accepts the repeated number of written memtables. These memtables are stockpiled on disk. It remains for every Cassandra table. A main feature of the SS Table is that it provides stability to the data files as it does not allow any changes once the data is written. Moreover, Cassandra generates three split files. These files are like bloom filter, partition summary and partition index.

Column family as the name suggests it relates to a structure that has a large number of rows. These are associated with a key-value set. Key represents the title of the column while value suggests the column data. You can relate it to the hash map exist in Java. The Column family is very manageable as it provides one rows having a hundred of columns while the others provide just 2 columns. There are no limitations to list of columns.

These all are the basic component of Cassandra. A node is a work as an individual machinery, a cluster is an accumulation of a great number of nodes and these nodes have a similar kind of collected data. While at the time of serving the customers where they are located at different locations Data centers are useful. In combination, we can say that it helps to group various nodes of a cluster into various data centers.

Cassandra Super Column is used to collect the same kind of data. These are really key-value sets. These values are referred to the column. It is a grouping arrangement of columns. They follow a sequel that is

Key store> column family > super column> column data structure in JSON (JavaScript Object Notation).