Cassandra-Logo Cassandra Interview Questions

(click to view answers)

Cassandra is one of the most favoredNoSQL distributed database management systems by Apache. With open source technology, Cassandra is efficiently designed to store and manage large volumes of data without any failure. Highly scalable for Big Data models and originally designed by Facebook, Apache Cassandra is written in Java comprising flexible schemas.

Unlike traditional or any other database, Apache Cassandradelivers near real-time performance simplifying the work of Developers, Administrators, Data Analysts and Software Engineers.

• Instead of master-slave architecture, Cassandra is established on peer-to-peer architecture ensuring no failure.

• It also assures phenomenal flexibility as it allows insertion of multiple nodes to any Cassandra cluster in any datacenter. Further, any client can forward its request to any server.

• Cassandra facilitates extensible scalability and can be easily scaled up and scaled down as per the requirements. With a high throughput for read and write operations, this NoSQL application need not be restarted while scaling.

• Cassandra is also revered for its strong data replication capability as it allows data storage at multiple locations enabling users to retrieve data from another location if one node fails. Users have the option to set up the number of replicas they want to create.

• Shows brilliant performance when used for massive datasets and thus, the most preferable NoSQL DB by most organizations.

• Operates on column-oriented structure and thus, quickens and simplifies the process of slicing. Even data access and retrieval becomes more efficient with column-based data model.

• Further, Apache Cassandra supports schema-free/schema-optional data model, which un-necessitate the purpose of showing all the columns required by your application.

Tunable Consistency is a phenomenal characteristic that makes Cassandra a favored database choice of Developers, Analysts and Big data Architects. Consistency refers to the up-to-date and synchronized data rows on all their replicas. Cassandra’s Tunable Consistency allows users to select the consistency level best suited for their use cases. It supports two consistencies -Eventual and Consistency and Strong Consistency.

The former guarantees consistency when no new updates are made on a given data item, all accesses return the last updated value eventually. Systems with eventual consistency are known to have achieved replica convergence.

For Strong consistency, Cassandra supports the following condition:

R + W > N, where

N – Number of replicas

W – Number of nodes that need to agree for a successful write

R – Number of nodes that need to agree for a successful read

Cassandra performs the write function by applying two commits-first it writes to a commit log on disk and then commits to an in-memory structured known as memtable. Once the two commits are successful, the write is achieved. Writes are written in the table structure as SSTable (sorted string table). Cassandra offers speedier write performance.

DataStaxOpsCenter: internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter

• SPM primarily administers Cassandra metrics and various OS and JVM metrics. Besides Cassandra, SPM also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms. The main features of SPM include correlation of events and metrics, distributed transaction tracing, creating real-time graphs with zooming, anomaly detection and heartbeat alerting.

Similar to table, memtable is in-memory/write-back cache space consisting of content in key and column format. The data in memtable is sorted by key, and each ColumnFamily consist of a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.

SSTable expands to ‘Sorted String Table,’ which refers to an important data file in Cassandra and accepts regular written memtables. They are stored on disk and exist for each Cassandra table. Exhibiting immutability, SStables do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary and a bloom filter.

Associated with SSTable, Bloom filter is an off-heap (off the Java heap to native memory) data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.

With a strong requirement to scale systems when additional resources are needed, CAP Theorem plays a major role in maintaining the scaling strategy. It is an efficient way to handle scaling in distributed systems. Consistency Availability and Partition tolerance (CAP) theorem states that in distributed systems like Cassandra, users can enjoy only two out of these three characteristics. One of them needs to be sacrificed. Consistency guarantees the return of most recent write for the client, Availability returns a rational response within minimum time and in Partition Tolerance, the system will continue its operations when network partitions occur. The two options available are AP and CP.

While a node is a single machine running Cassandra, cluster is a collection of nodes that have similar type of data grouped together. DataCentersare useful components when serving customers in different geographical areas. You can group different nodes of a cluster into different data centers.

big data training chennai, hadoop training chennai.

Using CQL (Cassandra Query Language).Cqlsh is used for interacting with database.

Cassandra Data Model consists of four main components:

Cluster : Made up of multiple nodes and keyspaces

Keyspace : A namespace to group multiple column families, especially one per partition

Column : It consists of a column name, value and timestamp

ColumnFamily : multiple columns with row key reference.

CQL is Cassandra Query language to access and query the Apache distributed database. It consists of a CQL parser that incites all the implementation details to the server. The syntax of CQL is similar to SQL but it does not alter the Cassandra data model.

Compaction refers to a maintenance process in Cassandra , in which, the SSTables are reorganized for data optimization of data structure son the disk. The compaction process is useful during interactive with memtable. There are two type sof compaction in Cassandra:

Minor compaction : started automatically when a new sstable is created. Here, Cassandra condenses all the equally sized sstables into one.

Major compaction : It is triggered manually using nodetool. Compacts all sstables of a ColumnFamily into one.

Unlike relational databases, Cassandra does not support ACID transactions.

Cqlsh expands to Cassandra Query language Shell that configures the CQL interactive terminal. It is a Python-base command-line prompt used on Linux or Windows and exequte CQL commands like ASSUME, CAPTURE, CONSITENCY, COPY, DESCRIBE and many others. With cqlsh, users can define a schema, insert data and execute a query.

Cassandra Super Column is a unique element consisting of similar collections of data. They are actually key-value pairs with values as columns. It is a sorted array of columns, and they follow a hierarchy when in action: keystore> column family> super column> column data structure in JSON.

Similar to row keys, super column data entries contains no independent values but are used to collect other columns. It is interesting to note that super column keys appearing in different rows do not necessarily match and will not ever.

ALL: Highly consistent. A write must be written to commitlog and memtable on all replica nodes in the cluster

EACH_QUORUM: A write must be written to commitlog and memtable on quorum of replica nodes in all data centers.

LOCAL_QUORUM: A write must be written to commitlog and memtable on quorum of replica nodes in the same center.

ONE: A write must be written to commitlog and memtableof at least one replica node.

TWO, Three: Same as One but at least two and three replica nodes, respectively

LOCAL_ONE: A write must be written for at least one replica node in the local data center

SERIAL: Linearizable Consistency to prevent unconditional updates

LOCAL_SERIAL: Same as Serial but restricted to local data center

machine learning in training,machine learning,machine learning course content

Both elements work on the principle of tuple having name and value. However, the former‘s value is a string while the value in latter is a Map of Columns with different data types.

Unlike Columns, Super Columns do not contain the third component of timestamp.

As the name suggests, ColumnFamily refers to a structure having infinite number of rows. That are referred by a key-value pair, where key is the name of the column and value represents the column data. It is much similar to a hashmap in java or dictionary in Python. Rememeber, the rows are not limited to a predefined list of Columns here. Also, the ColumnFamily is absolutely flexible with one row having 100 Columns while the other only 2 columns.

Source command is used to execute a file consisting of CQL statements.

Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.

Tombstone is row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassnadra supports eventual consistency, where the data must respond before any successful operation.

Since Cassandra is a Java application, it can successfully run on any Java-driven platform or Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on RedHat, CentOS, Debian and Ubuntu Linux platforms.

The default settings state that Cassandra uses 7000 ports for Cluster Management, 9160 for Thrift Clients, 8080 for JMX. These are all TCP ports and can be edited in the configuration file: bin/

• Yes, but keeping in mind the following processes.

• Do not forget to clear the commitlog with ‘nodetool drain’

• Turn off Cassandra to check that there is no data left in commitlog

• Delete the sstable files for the removed CFs

ReplicationFactor is the measure of number of data copies existing. It is important to increase the replication factor to log into the cluster.

Yes, but it will require running repair to alter the replica count of existing data.

machine learning in training,machine learning,machine learning course content

Using get_range_slices. You can start iteration with the empty string and after each iteration, the last key read serves as the start key for next iteration.

Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure. The various factors responsible for using Cassandra are

• It is fault tolerant and consistent

• Gigabytes to petabytes scalabilities

• It is a column-oriented database

• No single point of failure

• No need for separate caching layer

• Flexible schema design

• It has flexible data storage, easy data distribution, and fast writes

• It supports ACID (Atomicity, Consistency, Isolation, and Durability)properties

• Multi-data center and cloud capable

• Data compression

In Cassandra, composite type allows to define key or a column name with a concatenation of data of different type. You can use two types of Composite Type

• Row Key

• Column Name

All data stored as bytes.When you specify validator, Cassandra ensures those bytes are encoded as per requirement.Then a comparator orders the column based on the ordering specific to the encoding.While composite are just byte arrays with a specific encoding, for each component it stores a two byte length followed by the byte encoded component followed by a termination bit.

A cluster is a container for keyspaces. Cassandra database is segmented over several machines that operate together. The cluster is the outermost container which arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.

The other components of Cassandra are

• Node

• Data Center

• Cluster

• Commit log

• Mem-table

• SSTable

• Bloom Filter

In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consist of one keyspace per node.

Syntax for creating keyspace in Cassandra is

CREATE KEYSPACE <identifier> WITH <properties>

In Cassandra Column, basically there are three values

• Column Name

• Value

• Time Stamp

ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.

big data training chennai, hadoop training chennai.

There are various Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and adds it to a file while, command “Consistency” display the current consistency level or set a new consistency level.

While creating a table primary key is mandatory, it is made up of one or more columns of a table.

While adding a column you need to take care that the Column name is not conflicting with the existing column names .Table is not defined with compact storage option

Cassandra CQL collections help you to store multiple values in a single variable. In Cassandra, you can use CQL collections in following ways

LIST: It is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements)

SET: It is used for group of elements to store and returned in sorted orders (holds repeating elements)

MAP: It is a data type used to store a key-value pair of elements

Cassandra concatenate changed data to commitlog.Commitlog acts as a crash recovery log for data.Until the changed data is concatenated to commitlog write operation will be never considered successful.Data will not be lost once commitlog is flushed out to file.

SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.

The snitch is a configurable component of a Cassandra cluster used to define how the nodes are grouped together within the overall network topology (such as rack and data center groupings). Cassandra uses this information to route inter-node requests as efficiently as possible within the confines of the replica placement strategy. The snitch does not affect requests between the client application and Cassandra (it does not control which node a client connects to).

Simple Snitch- It has the strategy of placing the copy of the row on the next available node walking clockwise through the nodes.

Rack Inferring Snitch– It tries to place copies of rows of different racks in the data center. It will know about the rack and data center and will try to place copies in different racks and data centers. From the IP address, it can determine the data center address and the rack. So the IP address will have to be configured in such a way that the second unit of IP address will be used to identify the data center. The third unit identifies the rack.

There are two aspects to Capacity Planning- User data size and Usable Disk capacity. How can one estimate the user data size? If you have one terabyte of information to be stored, there can be some over heads above the one terabyte of data. There might also be some indexes that might take more time and space.

There is something called as Usable disk capacity. Sometimes, extra space is needed to carry out internal processes. Hence, all the disk space is not available to the user so the usable disk capacity has to be calculated first. Therefore, there are certain factors that you have to consider like the various operations happening on the disk, the various internal processes happening on Cassandra cluster etc.

Cassandra is the right choice when you need scalability and high availability without compromising on performance. Cassandra file system is an HDFS file system that is replaceable with your standard HDFS file. You can change the Hadoop configuration and explore and expose Cassandra’s file system as HDFS. With this file system, it is easy to get rid of the name nodes and data node daemons because Cassandra can take care of that.

machine learning in training,machine learning,machine learning course content

• The Cassandra file system is decentralized.

• It doesn’t have a single point of failure and has a replication facility.

• It is very similar to HDFS and this is an HDFS compatible system.

• Another important factor about Cassandra file system is that it can be used for indexing.

• Being indexed in an HDFS file system is very difficult since everything gets distributed on blocks but in the Cassandra file system, certainly, you can have the information index and hence, this provides a very unique advantage.

• One can have the index in the Cassandra file system and then the power of Hadoop could be used to traverse the data and do some smart scanning, instead of scanning all the data and finding out respective information.

OLTP is said to be more of an online transactional system or data storage system, where the user does lots of online transactions using the data store. It is also said to have more ad-hoc reads/writes happening on real time basis.

OLAP is more of an offline data store. It is accessed number of times in offline fashion. For example, Bulk log files are read and then written back to data files. Some of the common areas where OLAP is used are Log Jobs, Data mining Jobs, etc.

Cassandra is said to be more of OLTP, as it is real-time, whereas Hadoop is more of OLAP, since it is used for analytics and bulk writes.

In Cassandra, the communication between nodes is often like peer-to-peer communication, where every node talks to the other. If that’s the case, then all the nodes talk to one another, and there is a lot of communication happening. The Gossip Protocol is a method to resolve this communication chaos. In Cassandra, when one node talks to another, the node which is expected to respond, not only provides information about its status, but also provides information about the nodes that it had communicated with before. Through this process, there is a reduction in network log, more information is kept and efficiency of information gathering increases. The main feature of the protocol is to provide the latest information of any node respectively.

Failure Detection :

An important feature of Gossip Protocol is Failure Detection. Basically, when two nodes communicate with one another; for instance, Node A to Node B, then Node A sends a message ‘gossipdigestsynmessage’, which is very similar to TCP protocol to Node B. Here, Node B, once receives the message, sends an acknowledgement message ‘ack’, and then Node A responds with an acknowledgement message to Node B’s ‘ack’ message. This is known as the 3 way handshake.

If in case, the node goes down and does not send the ack message, then it will be a mark down. Even when the nodes are down, the other nodes will be periodically pinging and that is how the failure detection happens.

Starting Cassandra involves connecting to the machine where it is installed with the proper security credentials, and invoking the cassandra executable from the installation’s binary directory. An example of starting Cassandra on Mac could be:

sudo /Applications/Cassandra/apache-cassandra-1.1.1/bin/cassandra

•The basic command line interface (CLI) for logging into and executing commands against Cassandra is the cassandra-cli utility, which is found in the software installation’s bin directory.

•An example of logging into a local machine’s Cassandra installation using the CLI and the default Cassandra port might be:

Welcome to the Cassandra CLI.

Type ‘help;’ or ‘?’ for help.

Type ‘quit;’ or ‘exit;’ to quit.

Cassandra can be used in many different data management situations. Some of the most common use cases for Cassandra include:Serving as the operational/real-time/system-of-record datastore for Web or other online applications needing around-the-clock transactional input capabilities .Applications needing &ldquo;network independence&rdquo;, meaning systems that cannot worry about where data lives.

Cassandra is typically not the choice for transactional data that needs per-transaction commit/rollback capabilities. Note that Cassandra does have atomic transactional abilities on a per row/insert basis (but with no rollback capabilities).

The primary difference between Cassandra and Hadoop is that Cassandra targets real-time/operational data, while Hadoop has been designed for batch-based analytic work.

There are many different technical differences between Cassandra and Hadoop, including Cassandra’s underlying data structure (based on Google’s Bigtable), its fault-tolerant, peer-to-peer architecture, multi-data center capabilities, tunable data consistency, all nodes being the same (no concept of a namenode, etc.) and much more.

HBase is an open-source, column-oriented data store modeled after Google Bigtable, and is designed to offer Bigtable-like capabilities on top of data stored in Hadoop. However, while HBase shared the Bigtable design with Cassandra, its foundational architecture is much different.

A Cassandra cluster is much easier to setup and configure than a comparable HBase cluster. HBase’s reliance on the Hadoop namenode equates to there being a single point of failure in HBase, whereas with Cassandra, because all nodes are the same, there is no such issue

In internal performance tests conducted at DataStax (using the Yahoo Cloud Serving Benchmark &ndash; YCSB), Cassandra offered literally 5X better performance in writes and 4X better performance on reads than HBase.

MongoDB is a document-oriented database that is built upon a master-slave/sharding architecture. MongoDB is designed to store/manage collections of JSON-styled documents.

By contrast, Cassandra uses a peer-to-peer, write/read-anywhere styled architecture that is based on a combination of Google BigTable and Amazon Dynamo. This allows Cassandra to avoid the various complications and pitfalls of master/slave and sharding architectures. Moreover, Cassandra offers linear performance increases as new nodes are added to a cluster, scales to terabyte-petabyte data volumes, and has no single point of failure.

Cassandra has been built from the ground up to be a fault tolerant, peer-to-peer database that offers no single point of failure. Cassandra can automatically replicate data between nodes to offer data redundancy. It also offers built-in intelligence to replicate data between different physical server racks (so that if one rack goes down the data on other racks is safe) as well as between geographically dispersed data centers, and/or public Cloud providers and on-premises machines, which offers the strongest possible uptime and disaster recovery capabilities.

Automatically replicates data between nodes to offer data redundancy Offers built-in intelligence to replicate data between different physical server racks (so that if one rack goes down the data on other racks is safe).Easily replicates between geographically dispersed data centers.Leverages any combination of cloud and on-premise resources.

Cassandra does not use a master/slave architecture, but instead uses a peer-to-peer implementation, which avoids the pitfalls, latency problems, single point of failure issues, and performance headaches associated with master/slave setups.

Replication is the process of storing copies of data on multiple nodes to ensure reliability and fault tolerance. When you create a keyspace in Cassandra, you must decide the replica placement strategy: the number of replicas and how those replicas are distributed across nodes in the cluster. The replication strategy relies on the cluster-configured snitch to help it determine the physical location of nodes and their proximity to each other.

The total number of replicas across the cluster is often referred to as the relication factor. A replication factor of 1 means that there is only one copy of each row. A replication factor of 2 means two copies of each row. All replicas are equally important; there is no primary or master replica in terms of how read and write requests are handled.Replication options are defined when you create a keyspace in Cassandra. The snitch is configured per node.

Cassandra provides a number of options to partition your data across nodes in a cluster.

The RandomPartitioner is the default partitioning strategy for a Cassandra cluster. It uses a consistent hashing algorithm to determine which node will store a particular row. The end result is an even distribution of data across a cluster.

The ByteOrderedPartitioner ensures that row keys are stored in sorted order. It is not recommended for most use cases and can result in uneven distribution of data across a cluster.

machine learning in training,machine learning,machine learning course content

A seed node in Cassandra is a node that is contacted by other nodes when they first start up and join the cluster. A cluster can have multiple seed nodes. Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster. When a node first starts, it contacts a seed node to bootstrap the gossip communication process. The seed node designation has no purpose other than bootstrapping new nodes joining the cluster. Seed nodes are not a single point of failure.

Cassandra is capable of offering linear performance benefits when new nodes are added to a cluster.

A new machine can be added to an existing cluster by installing the Cassandra software on the server and configuring the new node so that it knows (1) the name of the Cassandra cluster it is joining; (2) the seed node(s) it should obtain its data from; (3) the range of data that it is responsible for, which is done by assigning a token to the node.

Nodes can be removed from a Cassandra cluster by using the nodetool utility and issuing a decommission command. This can be done without affecting the overall operations or uptime of the cluster.

Cassandra can easily replicate data between different physical datacenters by creating a keyspace that uses the replication strategy currently termed NetworkTopologyStrategy. This strategy allows you to configure Cassandra to automatically replicate data to different data centers and even different racks within datacenters to protect against specific rack/physical hardware failures causing a cluster to go down. It can also replicate data between public Clouds and on-premises machines.

The main Cassandra configuration file is the cassandra.yaml file, which houses all the main options that control how Cassandra operates.

•Cassandra’s architecture make it perfect for full Cloud deployments as well as hybrid implementations that store some data in the Cloud and other data on premises.

•DataStax provides an Amazon AMI that allows you to quickly deploy a Cassandra cluster on EC2. See the online documentation for a step-by-step guide to installing a Cassandra cluster on Amazon.

Cassandra negates the need for extra software caching layers like memcached through its distributed architecture, fast write throughput capabilities, and internal memory caching structures.

•Cassandra is architected in a peer-to-peer fashion and uses a protocol called gossip to communicate with other nodes in a cluster. The gossip process runs every second to exchange information across the cluster.

•Gossip only includes information about the cluster itself (up/down, joining, leaving, version, schema, etc.) and does not manage the data. Data is transferred node-to-node using a message passing like protocol on a distinct port from what client applications connect to.

•The Cassandra partitioner turns a column family key into a token, the replication strategy picks the set of nodes responsible for that token (using information from the snitch) and Cassandra sends messages to those replicas with the request (read or write).

•The gossip protocol is used to determine the state of all nodes in a cluster and if a particular node has gone down.

•The gossip process tracks heartbeats from other nodes and uses an accrual detection mechanism to calculate a per-node threshold that takes into account network conditions, workload, or other conditions that might affect perceived heartbeat rate before a node is actually marked as down.

•The configuration parameter phi_convict_threshold in the cassandra.yaml file is used to control Cassandra’s sensitivity of node failure detection. The default value is appropriate for most situations. However in Cloud environments, such as Amazon EC2, the value should be increased to 12 in order to account for network issues that sometimes occur on such platforms.

Yes, data compression is available with Cassandra 1.0 and above. The snappy compression algorithm from Google is used and is able to deliver fairly impressive storage savings, in some cases compressing raw data up to 80+% with no performance penalties for read/write operations. In fact, because of the reduction in physical I/O, compression actually increases performance in some use cases. Compression is enabled/disabled on a per-column family basis and is not enabled by default.

•Currently, the most common method for backing up data in Cassandra is using the snapshot function in the nodetool utility. This is an online operation and does not require any downtime or block any operations on the server.

•Snapshots are sent by default to a snapshots directory that is located in the Cassandra data directory (controlled via the data_file_directories in the cassandra.yaml file). Once taken, snapshots can be moved off-site to be protected.

•Incremental backups (i.e. data backed up since the last full snapshot) can be performed by setting the incremental_backups parameter in the cassandra.yaml file to &lsquo;true’. When incremental backup is enabled, Cassandra copies every flushed SSTable for each keyspace to a backup directory located under the Cassandra data directory. Restoring from an incremental backup involves first restoring from the last full snapshot and then copying each incremental file back into the Cassandra data directory. Eg.

Create a Cassandra snapshot for a single nodenodetool -h snapshot KEYSPACE_NAME

Create a cluster wide Cassandra snapshot

clustertool -h global_snapshot KEYSPACE_NAME

In general, restoring a Cassandra node is done by first following these procedures

step 1: Shut down the node that is to be restored

step 2: Clear the commit log by removing all the files in the commit log directory


rm /var/lib/cassandra/commitlog/*

step 3:

Remove the database files for all keyspaces


rm /var/lib/cassandra/data/keyspace1/*.db

Take care so as not to remove the snapshot directory for the keyspace

step 4:

Copy the latest snapshot directory contents for each keyspace to the keyspace’s data directory


cp -p /var/lib/cassandra/data/keyspace1/snapshots/56046198758643-snapshotkeyspace1/* /var/lib/cassandra/data/keyspace1

step 5:

Copy any incremental backups taken for each keyspace into the keyspace’s data directory

step 6:

Repeat steps 3-5 for each keyspace

step 7:

Restart the node

big data training chennai, hadoop training chennai.
div class=”panel-group” >

•Yes. First, data durability is fully supported in Cassandra so that any data written to a database cluster is first written to a commit log in the same fashion as nearly every popular RDBMS does.

•Second, Cassandra offers tunable data consistency so that a developer or administrator can choose how strong they wish consistency across nodes to be. The strongest form of consistency is to mandate that any data modifications be made to all nodes, with any unsuccessful attempt on a node resulting in a failed data operation. Cassandra provides consistency in the CAP sense in that all readers will see the same values.

•Other forms of tunable consistency involve having a quorum of nodes written to or just one node for the loosest form of consistency. Cassandra is very flexible and allows data consistency to be chosen on a per operation basis if needed so that very strong consistency can be used when desired, or very loose consistency can be utilized when the use case permits.