Cassandra Interview Questions
(click to view answers)
Cassandra is one of the most favoredNoSQL distributed database management systems by Apache. With open source technology, Cassandra is efficiently designed to store and manage large volumes of data without any failure. Highly scalable for Big Data models and originally designed by Facebook, Apache Cassandra is written in Java comprising flexible schemas.
Unlike traditional or any other database, Apache Cassandradelivers near real-time performance simplifying the work of Developers, Administrators, Data Analysts and Software Engineers.
• Instead of master-slave architecture, Cassandra is established on peer-to-peer architecture ensuring no failure.
• It also assures phenomenal flexibility as it allows insertion of multiple nodes to any Cassandra cluster in any datacenter. Further, any client can forward its request to any server.
• Cassandra facilitates extensible scalability and can be easily scaled up and scaled down as per the requirements. With a high throughput for read and write operations, this NoSQL application need not be restarted while scaling.
• Cassandra is also revered for its strong data replication capability as it allows data storage at multiple locations enabling users to retrieve data from another location if one node fails. Users have the option to set up the number of replicas they want to create.
• Shows brilliant performance when used for massive datasets and thus, the most preferable NoSQL DB by most organizations.
• Operates on column-oriented structure and thus, quickens and simplifies the process of slicing. Even data access and retrieval becomes more efficient with column-based data model.
• Further, Apache Cassandra supports schema-free/schema-optional data model, which un-necessitate the purpose of showing all the columns required by your application.
Tunable Consistency is a phenomenal characteristic that makes Cassandra a favored database choice of Developers, Analysts and Big data Architects. Consistency refers to the up-to-date and synchronized data rows on all their replicas. Cassandra’s Tunable Consistency allows users to select the consistency level best suited for their use cases. It supports two consistencies -Eventual and Consistency and Strong Consistency.
The former guarantees consistency when no new updates are made on a given data item, all accesses return the last updated value eventually. Systems with eventual consistency are known to have achieved replica convergence.
For Strong consistency, Cassandra supports the following condition:
R + W > N, where
N – Number of replicas
W – Number of nodes that need to agree for a successful write
R – Number of nodes that need to agree for a successful read
Cassandra performs the write function by applying two commits-first it writes to a commit log on disk and then commits to an in-memory structured known as memtable. Once the two commits are successful, the write is achieved. Writes are written in the table structure as SSTable (sorted string table). Cassandra offers speedier write performance.
DataStaxOpsCenter: internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter
• SPM primarily administers Cassandra metrics and various OS and JVM metrics. Besides Cassandra, SPM also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms. The main features of SPM include correlation of events and metrics, distributed transaction tracing, creating real-time graphs with zooming, anomaly detection and heartbeat alerting.
Similar to table, memtable is in-memory/write-back cache space consisting of content in key and column format. The data in memtable is sorted by key, and each ColumnFamily consist of a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.
SSTable expands to ‘Sorted String Table,’ which refers to an important data file in Cassandra and accepts regular written memtables. They are stored on disk and exist for each Cassandra table. Exhibiting immutability, SStables do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary and a bloom filter.
Associated with SSTable, Bloom filter is an off-heap (off the Java heap to native memory) data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.
With a strong requirement to scale systems when additional resources are needed, CAP Theorem plays a major role in maintaining the scaling strategy. It is an efficient way to handle scaling in distributed systems. Consistency Availability and Partition tolerance (CAP) theorem states that in distributed systems like Cassandra, users can enjoy only two out of these three characteristics. One of them needs to be sacrificed. Consistency guarantees the return of most recent write for the client, Availability returns a rational response within minimum time and in Partition Tolerance, the system will continue its operations when network partitions occur. The two options available are AP and CP.
While a node is a single machine running Cassandra, cluster is a collection of nodes that have similar type of data grouped together. DataCentersare useful components when serving customers in different geographical areas. You can group different nodes of a cluster into different data centers.
Using CQL (Cassandra Query Language).Cqlsh is used for interacting with database.
Windows and Linux
Cassandra Data Model consists of four main components:
Cluster : Made up of multiple nodes and keyspaces
Keyspace : A namespace to group multiple column families, especially one per partition
Column : It consists of a column name, value and timestamp
ColumnFamily : multiple columns with row key reference.
CQL is Cassandra Query language to access and query the Apache distributed database. It consists of a CQL parser that incites all the implementation details to the server. The syntax of CQL is similar to SQL but it does not alter the Cassandra data model.
Compaction refers to a maintenance process in Cassandra , in which, the SSTables are reorganized for data optimization of data structure son the disk. The compaction process is useful during interactive with memtable. There are two type sof compaction in Cassandra:
Minor compaction : started automatically when a new sstable is created. Here, Cassandra condenses all the equally sized sstables into one.
Major compaction : It is triggered manually using nodetool. Compacts all sstables of a ColumnFamily into one.
Unlike relational databases, Cassandra does not support ACID transactions.
Cqlsh expands to Cassandra Query language Shell that configures the CQL interactive terminal. It is a Python-base command-line prompt used on Linux or Windows and exequte CQL commands like ASSUME, CAPTURE, CONSITENCY, COPY, DESCRIBE and many others. With cqlsh, users can define a schema, insert data and execute a query.
Cassandra Super Column is a unique element consisting of similar collections of data. They are actually key-value pairs with values as columns. It is a sorted array of columns, and they follow a hierarchy when in action: keystore> column family> super column> column data structure in JSON.
Similar to row keys, super column data entries contains no independent values but are used to collect other columns. It is interesting to note that super column keys appearing in different rows do not necessarily match and will not ever.
• ALL: Highly consistent. A write must be written to commitlog and memtable on all replica nodes in the cluster
• EACH_QUORUM: A write must be written to commitlog and memtable on quorum of replica nodes in all data centers.
• LOCAL_QUORUM: A write must be written to commitlog and memtable on quorum of replica nodes in the same center.
• ONE: A write must be written to commitlog and memtableof at least one replica node.
• TWO, Three: Same as One but at least two and three replica nodes, respectively
• LOCAL_ONE: A write must be written for at least one replica node in the local data center
• SERIAL: Linearizable Consistency to prevent unconditional updates
• LOCAL_SERIAL: Same as Serial but restricted to local data center
Both elements work on the principle of tuple having name and value. However, the former‘s value is a string while the value in latter is a Map of Columns with different data types.
Unlike Columns, Super Columns do not contain the third component of timestamp.
As the name suggests, ColumnFamily refers to a structure having infinite number of rows. That are referred by a key-value pair, where key is the name of the column and value represents the column data. It is much similar to a hashmap in java or dictionary in Python. Rememeber, the rows are not limited to a predefined list of Columns here. Also, the ColumnFamily is absolutely flexible with one row having 100 Columns while the other only 2 columns.
Source command is used to execute a file consisting of CQL statements.
Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.
Tombstone is row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassnadra supports eventual consistency, where the data must respond before any successful operation.
Since Cassandra is a Java application, it can successfully run on any Java-driven platform or Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on RedHat, CentOS, Debian and Ubuntu Linux platforms.
The default settings state that Cassandra uses 7000 ports for Cluster Management, 9160 for Thrift Clients, 8080 for JMX. These are all TCP ports and can be edited in the configuration file: bin/Cassandra.in.sh
• Yes, but keeping in mind the following processes.
• Do not forget to clear the commitlog with ‘nodetool drain’
• Turn off Cassandra to check that there is no data left in commitlog
• Delete the sstable files for the removed CFs
ReplicationFactor is the measure of number of data copies existing. It is important to increase the replication factor to log into the cluster.
Yes, but it will require running repair to alter the replica count of existing data.
Using get_range_slices. You can start iteration with the empty string and after each iteration, the last key read serves as the start key for next iteration.
Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure. The various factors responsible for using Cassandra are
• It is fault tolerant and consistent
• Gigabytes to petabytes scalabilities
• It is a column-oriented database
• No single point of failure
• No need for separate caching layer
• Flexible schema design
• It has flexible data storage, easy data distribution, and fast writes
• It supports ACID (Atomicity, Consistency, Isolation, and Durability)properties
• Multi-data center and cloud capable
• Data compression
In Cassandra, composite type allows to define key or a column name with a concatenation of data of different type. You can use two types of Composite Type
• Row Key
• Column Name
All data stored as bytes.When you specify validator, Cassandra ensures those bytes are encoded as per requirement.Then a comparator orders the column based on the ordering specific to the encoding.While composite are just byte arrays with a specific encoding, for each component it stores a two byte length followed by the byte encoded component followed by a termination bit.
A cluster is a container for keyspaces. Cassandra database is segmented over several machines that operate together. The cluster is the outermost container which arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.
The other components of Cassandra are
• Data Center
• Commit log
• Bloom Filter
In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consist of one keyspace per node.
Syntax for creating keyspace in Cassandra is
CREATE KEYSPACE <identifier> WITH <properties>
In Cassandra Column, basically there are three values
• Column Name
• Time Stamp
ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.
There are various Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and adds it to a file while, command “Consistency” display the current consistency level or set a new consistency level.
While creating a table primary key is mandatory, it is made up of one or more columns of a table.
While adding a column you need to take care that the Column name is not conflicting with the existing column names .Table is not defined with compact storage option
Cassandra CQL collections help you to store multiple values in a single variable. In Cassandra, you can use CQL collections in following ways
LIST: It is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements)
SET: It is used for group of elements to store and returned in sorted orders (holds repeating elements)
MAP: It is a data type used to store a key-value pair of elements
Cassandra concatenate changed data to commitlog.Commitlog acts as a crash recovery log for data.Until the changed data is concatenated to commitlog write operation will be never considered successful.Data will not be lost once commitlog is flushed out to file.
SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.
The snitch is a configurable component of a Cassandra cluster used to define how the nodes are grouped together within the overall network topology (such as rack and data center groupings). Cassandra uses this information to route inter-node requests as efficiently as possible within the confines of the replica placement strategy. The snitch does not affect requests between the client application and Cassandra (it does not control which node a client connects to).
Simple Snitch- It has the strategy of placing the copy of the row on the next available node walking clockwise through the nodes.
Rack Inferring Snitch– It tries to place copies of rows of different racks in the data center. It will know about the rack and data center and will try to place copies in different racks and data centers. From the IP address, it can determine the data center address and the rack. So the IP address will have to be configured in such a way that the second unit of IP address will be used to identify the data center. The third unit identifies the rack.
There are two aspects to Capacity Planning- User data size and Usable Disk capacity. How can one estimate the user data size? If you have one terabyte of information to be stored, there can be some over heads above the one terabyte of data. There might also be some indexes that might take more time and space.
There is something called as Usable disk capacity. Sometimes, extra space is needed to carry out internal processes. Hence, all the disk space is not available to the user so the usable disk capacity has to be calculated first. Therefore, there are certain factors that you have to consider like the various operations happening on the disk, the various internal processes happening on Cassandra cluster etc.
Cassandra is the right choice when you need scalability and high availability without compromising on performance. Cassandra file system is an HDFS file system that is replaceable with your standard HDFS file. You can change the Hadoop configuration and explore and expose Cassandra’s file system as HDFS. With this file system, it is easy to get rid of the name nodes and data node daemons because Cassandra can take care of that.
• The Cassandra file system is decentralized.
• It doesn’t have a single point of failure and has a replication facility.
• It is very similar to HDFS and this is an HDFS compatible system.
• Another important factor about Cassandra file system is that it can be used for indexing.
• Being indexed in an HDFS file system is very difficult since everything gets distributed on blocks but in the Cassandra file system, certainly, you can have the information index and hence, this provides a very unique advantage.
• One can have the index in the Cassandra file system and then the power of Hadoop could be used to traverse the data and do some smart scanning, instead of scanning all the data and finding out respective information.
OLTP is said to be more of an online transactional system or data storage system, where the user does lots of online transactions using the data store. It is also said to have more ad-hoc reads/writes happening on real time basis.
OLAP is more of an offline data store. It is accessed number of times in offline fashion. For example, Bulk log files are read and then written back to data files. Some of the common areas where OLAP is used are Log Jobs, Data mining Jobs, etc.
Cassandra is said to be more of OLTP, as it is real-time, whereas Hadoop is more of OLAP, since it is used for analytics and bulk writes.
In Cassandra, the communication between nodes is often like peer-to-peer communication, where every node talks to the other. If that’s the case, then all the nodes talk to one another, and there is a lot of communication happening. The Gossip Protocol is a method to resolve this communication chaos. In Cassandra, when one node talks to another, the node which is expected to respond, not only provides information about its status, but also provides information about the nodes that it had communicated with before. Through this process, there is a reduction in network log, more information is kept and efficiency of information gathering increases. The main feature of the protocol is to provide the latest information of any node respectively.
Failure Detection :
An important feature of Gossip Protocol is Failure Detection. Basically, when two nodes communicate with one another; for instance, Node A to Node B, then Node A sends a message ‘gossipdigestsynmessage’, which is very similar to TCP protocol to Node B. Here, Node B, once receives the message, sends an acknowledgement message ‘ack’, and then Node A responds with an acknowledgement message to Node B’s ‘ack’ message. This is known as the 3 way handshake.
If in case, the node goes down and does not send the ack message, then it will be a mark down. Even when the nodes are down, the other nodes will be periodically pinging and that is how the failure detection happens.
Starting Cassandra involves connecting to the machine where it is installed with the proper security credentials, and invoking the cassandra executable from the installation’s binary directory. An example of starting Cassandra on Mac could be:
•The basic command line interface (CLI) for logging into and executing commands against Cassandra is the cassandra-cli utility, which is found in the software installation’s bin directory.
•An example of logging into a local machine’s Cassandra installation using the CLI and the default Cassandra port might be:
Welcome to the Cassandra CLI.
Type ‘help;’ or ‘?’ for help.
Type ‘quit;’ or ‘exit;’ to quit.
Cassandra can be used in many different data management situations. Some of the most common use cases for Cassandra include:Serving as the operational/real-time/system-of-record datastore for Web or other online applications needing around-the-clock transactional input capabilities .Applications needing “network independence”, meaning systems that cannot worry about where data lives.
Cassandra is typically not the choice for transactional data that needs per-transaction commit/rollback capabilities. Note that Cassandra does have atomic transactional abilities on a per row/insert basis (but with no rollback capabilities).
The primary difference between Cassandra and Hadoop is that Cassandra targets real-time/operational data, while Hadoop has been designed for batch-based analytic work.
There are many different technical differences between Cassandra and Hadoop, including Cassandra’s underlying data structure (based on Google’s Bigtable), its fault-tolerant, peer-to-peer architecture, multi-data center capabilities, tunable data consistency, all nodes being the same (no concept of a namenode, etc.) and much more.
HBase is an open-source, column-oriented data store modeled after Google Bigtable, and is designed to offer Bigtable-like capabilities on top of data stored in Hadoop. However, while HBase shared the Bigtable design with Cassandra, its foundational architecture is much different.
A Cassandra cluster is much easier to setup and configure than a comparable HBase cluster. HBase’s reliance on the Hadoop namenode equates to there being a single point of failure in HBase, whereas with Cassandra, because all nodes are the same, there is no such issue
In internal performance tests conducted at DataStax (using the Yahoo Cloud Serving Benchmark – YCSB), Cassandra offered literally 5X better performance in writes and 4X better performance on reads than HBase.
MongoDB is a document-oriented database that is built upon a master-slave/sharding architecture. MongoDB is designed to store/manage collections of JSON-styled documents.
By contrast, Cassandra uses a peer-to-peer, write/read-anywhere styled architecture that is based on a combination of Google BigTable and Amazon Dynamo. This allows Cassandra to avoid the various complications and pitfalls of master/slave and sharding architectures. Moreover, Cassandra offers linear performance increases as new nodes are added to a cluster, scales to terabyte-petabyte data volumes, and has no single point of failure.
Cassandra has been built from the ground up to be a fault tolerant, peer-to-peer database that offers no single point of failure. Cassandra can automatically replicate data between nodes to offer data redundancy. It also offers built-in intelligence to replicate data between different physical server racks (so that if one rack goes down the data on other racks is safe) as well as between geographically dispersed data centers, and/or public Cloud providers and on-premises machines, which offers the strongest possible uptime and disaster recovery capabilities.
Automatically replicates data between nodes to offer data redundancy Offers built-in intelligence to replicate data between different physical server racks (so that if one rack goes down the data on other racks is safe).Easily replicates between geographically dispersed data centers.Leverages any combination of cloud and on-premise resources.
Cassandra does not use a master/slave architecture, but instead uses a peer-to-peer implementation, which avoids the pitfalls, latency problems, single point of failure issues, and performance headaches associated with master/slave setups.
Replication is the process of storing copies of data on multiple nodes to ensure reliability and fault tolerance. When you create a keyspace in Cassandra, you must decide the replica placement strategy: the number of replicas and how those replicas are distributed across nodes in the cluster. The replication strategy relies on the cluster-configured snitch to help it determine the physical location of nodes and their proximity to each other.
The total number of replicas across the cluster is often referred to as the relication factor. A replication factor of 1 means that there is only one copy of each row. A replication factor of 2 means two copies of each row. All replicas are equally important; there is no primary or master replica in terms of how read and write requests are handled.Replication options are defined when you create a keyspace in Cassandra. The snitch is configured per node.
Cassandra provides a number of options to partition your data across nodes in a cluster.
The RandomPartitioner is the default partitioning strategy for a Cassandra cluster. It uses a consistent hashing algorithm to determine which node will store a particular row. The end result is an even distribution of data across a cluster.
The ByteOrderedPartitioner ensures that row keys are stored in sorted order. It is not recommended for most use cases and can result in uneven distribution of data across a cluster.
A seed node in Cassandra is a node that is contacted by other nodes when they first start up and join the cluster. A cluster can have multiple seed nodes. Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster. When a node first starts, it contacts a seed node to bootstrap the gossip communication process. The seed node designation has no purpose other than bootstrapping new nodes joining the cluster. Seed nodes are not a single point of failure.
Cassandra is capable of offering linear performance benefits when new nodes are added to a cluster.
A new machine can be added to an existing cluster by installing the Cassandra software on the server and configuring the new node so that it knows (1) the name of the Cassandra cluster it is joining; (2) the seed node(s) it should obtain its data from; (3) the range of data that it is responsible for, which is done by assigning a token to the node.
Nodes can be removed from a Cassandra cluster by using the nodetool utility and issuing a decommission command. This can be done without affecting the overall operations or uptime of the cluster.
Cassandra can easily replicate data between different physical datacenters by creating a keyspace that uses the replication strategy currently termed NetworkTopologyStrategy. This strategy allows you to configure Cassandra to automatically replicate data to different data centers and even different racks within datacenters to protect against specific rack/physical hardware failures causing a cluster to go down. It can also replicate data between public Clouds and on-premises machines.
The main Cassandra configuration file is the cassandra.yaml file, which houses all the main options that control how Cassandra operates.
•Cassandra’s architecture make it perfect for full Cloud deployments as well as hybrid implementations that store some data in the Cloud and other data on premises.
•DataStax provides an Amazon AMI that allows you to quickly deploy a Cassandra cluster on EC2. See the online documentation for a step-by-step guide to installing a Cassandra cluster on Amazon.
Cassandra negates the need for extra software caching layers like memcached through its distributed architecture, fast write throughput capabilities, and internal memory caching structures.
•Cassandra is architected in a peer-to-peer fashion and uses a protocol called gossip to communicate with other nodes in a cluster. The gossip process runs every second to exchange information across the cluster.
•Gossip only includes information about the cluster itself (up/down, joining, leaving, version, schema, etc.) and does not manage the data. Data is transferred node-to-node using a message passing like protocol on a distinct port from what client applications connect to.
•The Cassandra partitioner turns a column family key into a token, the replication strategy picks the set of nodes responsible for that token (using information from the snitch) and Cassandra sends messages to those replicas with the request (read or write).
•The gossip protocol is used to determine the state of all nodes in a cluster and if a particular node has gone down.
•The gossip process tracks heartbeats from other nodes and uses an accrual detection mechanism to calculate a per-node threshold that takes into account network conditions, workload, or other conditions that might affect perceived heartbeat rate before a node is actually marked as down.
•The configuration parameter phi_convict_threshold in the cassandra.yaml file is used to control Cassandra’s sensitivity of node failure detection. The default value is appropriate for most situations. However in Cloud environments, such as Amazon EC2, the value should be increased to 12 in order to account for network issues that sometimes occur on such platforms.
Yes, data compression is available with Cassandra 1.0 and above. The snappy compression algorithm from Google is used and is able to deliver fairly impressive storage savings, in some cases compressing raw data up to 80+% with no performance penalties for read/write operations. In fact, because of the reduction in physical I/O, compression actually increases performance in some use cases. Compression is enabled/disabled on a per-column family basis and is not enabled by default.
•Currently, the most common method for backing up data in Cassandra is using the snapshot function in the nodetool utility. This is an online operation and does not require any downtime or block any operations on the server.
•Snapshots are sent by default to a snapshots directory that is located in the Cassandra data directory (controlled via the data_file_directories in the cassandra.yaml file). Once taken, snapshots can be moved off-site to be protected.
•Incremental backups (i.e. data backed up since the last full snapshot) can be performed by setting the incremental_backups parameter in the cassandra.yaml file to ‘true’. When incremental backup is enabled, Cassandra copies every flushed SSTable for each keyspace to a backup directory located under the Cassandra data directory. Restoring from an incremental backup involves first restoring from the last full snapshot and then copying each incremental file back into the Cassandra data directory. Eg.
Create a Cassandra snapshot for a single nodenodetool -h 10.10.10.1 snapshot KEYSPACE_NAME
Create a cluster wide Cassandra snapshot
clustertool -h 10.10.10.1 global_snapshot KEYSPACE_NAME
In general, restoring a Cassandra node is done by first following these procedures
step 1: Shut down the node that is to be restored
step 2: Clear the commit log by removing all the files in the commit log directory
Remove the database files for all keyspaces
Take care so as not to remove the snapshot directory for the keyspace
Copy the latest snapshot directory contents for each keyspace to the keyspace’s data directory
cp -p /var/lib/cassandra/data/keyspace1/snapshots/56046198758643-snapshotkeyspace1/* /var/lib/cassandra/data/keyspace1
Copy any incremental backups taken for each keyspace into the keyspace’s data directory
Repeat steps 3-5 for each keyspace
Restart the node
•Yes. First, data durability is fully supported in Cassandra so that any data written to a database cluster is first written to a commit log in the same fashion as nearly every popular RDBMS does.
•Second, Cassandra offers tunable data consistency so that a developer or administrator can choose how strong they wish consistency across nodes to be. The strongest form of consistency is to mandate that any data modifications be made to all nodes, with any unsuccessful attempt on a node resulting in a failed data operation. Cassandra provides consistency in the CAP sense in that all readers will see the same values.
•Other forms of tunable consistency involve having a quorum of nodes written to or just one node for the loosest form of consistency. Cassandra is very flexible and allows data consistency to be chosen on a per operation basis if needed so that very strong consistency can be used when desired, or very loose consistency can be utilized when the use case permits.
•In Cassandra, consistency refers to how up-to-date and synchronized a row of data is on all of its replicas. Cassandra offers a number of built-in features to ensure data consistency:
•Hinted Handoff Writes &ndash:Writes are always sent to all replicas for the specified row regardless of the consistency level specified by the client. If a node happens to be down at the time of write, its corresponding replicas will save hints about the missed writes, and then handoff the affected rows once the node comes back online again. Hinted handoff ensures data consistency due to short, transient node outages.
•Read Repair &ndash: Read operations trigger consistency across all replicas for a requested row using a process called read repair. For reads, there are two types of read requests that a coordinator node can send to a replica; a direct read request and a background read repair request. The number of replicas contacted by a direct read request is determined by the read consistency level specified by the client. Background read repair requests are sent to any additional replicas that did not receive a direct request. Read repair requests ensure that the requested row is made consistent on all replicas.
•Anti-Entropy Node Repair &ndash: For data that is not read frequently, or to update data on a node that has been down for an extended period, the node repair process (also referred to as anti-entropy repair) ensures that all data on a replica is made consistent. Node repair (using the nodetool utility) should be run routinely as part of regular cluster maintenance operations.
•Most RDBMS’s have an unload utility that allows data to be unloaded to flat files. Once in flat file format, the sstableloader utility can be used to load the data into Cassandra column families.
•Some developers write programs to connect to both an RDBMS and Cassandra and move data in that way.
Read operations trigger consistency checks across all replicas for a requested row using a process called read repair. For reads, there are two types of read requests that a coordinator node can send to a replica; a direct read request and a background read repair request. The number of replicas contacted by a direct read request is determined by the read consistency level specified by the client. Background read repair requests are sent to any additional replicas that did not receive a direct request. Read repair requests ensure that the requested row is made consistent on all replicas. Read repair is an optional feature and can be configured per column family.
There are a number of CQL (Cassandra Query Language) drivers and native client libraries available for most all popular development languages (e.g. Java, Ruby, etc.)
•In a relational database, you must specify a data type for each column when you define a table. The data type constrains the values that can be inserted into that column. For example, if you have a column defined as an integer datatype, you would not be allowed to insert character data into that column.
•In Cassandra, you can specify a data type for both the column name (called a comparator) as well as for row key and column values (called a validator).Column and row key data in Cassandra is always stored internally as hex byte arrays, but the compartor/validators are used to verify data on insert and translate data on retrieval.
•In the case of comparators (column names), the comparator also determines the sort order in which columns are stored.
Cassandra comes with the following comparators and validators:
Bytes (no validation)
UTF-8 encoded strings
128-bit UUID by byte value
Version 1 128-bit UUID by timestamp
64-bit signed integer
* can only be used as a column validator, not valid as a row key validator or column name comparator
•Yes and No, depending on what you mean by “transactions”. Unlike relational databases, Cassandra does not offer fully ACID-compliant transactions. There is no locking or transactional dependencies when concurrently updating multiple rows or column families.
•But if by “transactions” you mean real-time data entry and retrieval, with durability and tunable consistency, then yes.
•Cassandra does not support transactions in the sense of bundling multiple row updates into one all-or-nothing operation. Nor does it roll back when a write succeeds on one replica, but fails on other replicas. It is possible in Cassandra to have a write operation report a failure to the client, but still actually persist the write to a replica.
•However, this does not mean that Cassandra cannot be used as an operational or real time data store. Data is very safe in Cassandra because writes in Cassandra are durable. All writes to a replica node are recorded both in memory and in a commit log before they are acknowledged as a success. If a crash or server failure occurs before the memory tables are flushed to disk, the commit log is replayed on restart to recover any lost writes.
•Cassandra is a gossip-based distributed system. ListenAddress is also ‘contact me here address,’ i.e., the address it tells other nodes to reach it at. Telling other nodes ‘contact me on any of my addresses’ is a bad idea; if different nodes in the cluster pick different addresses for you, Bad Things happen.
•If you don’t want to manually specify an IP to ListenAddress for each node in your cluster (understandable!), leave it blank and Cassandra will use InetAddress.getLocalHost() to pick an address. Then it’s up to you or your ops team to make things resolve correctly (/etc/hosts/, dns, etc).
•One exception to this process is JMX, which by default binds to 0.0.0.0 (Java bug 6425769).
•This is a symptom of memory pressure, resulting in a storm of GC operations as the JVM frantically tries to free enough heap to continue to operate. Eventually, the server will crash from OutOfMemory; usually, but not always, it will be able to log this final error before the JVM terminates.
•You can increase the amount of memory the JVM uses, or decrease the insert threshold before Cassandra flushes its memtables. See MemtableThresholds for details.
•Setting your cache sizes too large can result in memory pressure.
• Yes, but it’s important that you do it correctly.
• Empty the commitlog with ‘nodetool drain.’
• Shutdown Cassandra and verify that there is no remaining data in the commitlog.
• Delete the sstable files (-Data.db, -Index.db, and -Filter.db) for any CFs removed, and rename the files for any CFs that were renamed.
• Make necessary changes to your storage-conf.xml.
• Start Cassandra back up and your edits should take effect.
• No, any node in the cluster will work; Cassandra nodes proxy your request as needed. This leaves the client with a number of options for end point selection:
• You can maintain a list of contact nodes (all or a subset of the nodes in the cluster), and configure your clients to choose among them.
• Use round-robin DNS and create a record that points to a set of contact nodes (recommended).
• Use the get_string_property(‘token map’) RPC to obtain an update-to-date list of the nodes in the cluster and cycle through them. Deploy a load-balancer, proxy, etc.
• When using a higher-level client you should investigate which, if any, options are implemented by your higher-level client to help you distribute your requests across nodes in a cluster.
•Unlike all major relational databases and some NoSQL systems, Cassandra does not use b-trees and in-place updates on disk. Instead, it uses a sstable/memtable model like Bigtable’s: writes to each ColumnFamily are grouped together in an in-memory structure before being flushed (sorted and written to disk). This means that writes cost no random I/O, compared to a b-tree system which not only has to seek to the data location to overwrite, but also may have to seek to read different levels of the index if it outgrows disk cache!
•The downside is that on a read, Cassandra has to (potentially) merge row fragments from multiple sstables on disk. We think this is a tradeoff worth making, first because scaling writes has always been harder than scaling reads, and second because as your data corpus grows Cassandra’s read disadvantage narrows vs b-tree systems that have to do multiple seeks against a large index. See MemtableSSTable for more details.
•This happens when you have the same token assigned to each node. Don’t do that.
•Most often this bites people who deploy by installing Cassandra on a VM (especially when using the Debian package, which auto-starts Cassandra after installation, thus generating and saving a token), then cloning that VM to other nodes.
•The easiest fix is to wipe the data and commitlog directories, thus making sure that each node will generate a random token on the next restart.
•Because get_range_slice says, ‘apply this predicate to the range of rows given,’ meaning, if the predicate result is empty, we have to include an empty result for that row key. It is perfectly valid to perform such a query returning empty column lists for some or all keys, even if no deletions have been performed.
•So to special case leaving out result entries for deletions, we would have to check the entire rest of the row to make sure there is no undeleted data anywhere else either (in which case leaving the key out would be an error).
•This is what we used to do with the old get_key_range method, but the performance hit turned out to be unacceptable.
• Yes, but it will require running repair to change the replica count of existing data.
• Alter the ReplicationFactor for the desired keyspace(s) using cassandra-cli.If you’re reducing the ReplicationFactor:
• Run ‘nodetool cleanup’ on the cluster to remove surplus replicated data. Cleanup runs on a per-node basis.If you’re increasing the ReplicationFactor:
• Run ‘nodetool repair’ to run an anti-entropy repair on the cluster. Repair runs on a per-replica set basis. This is an intensive process that may result in adverse cluster performance. It’s highly recommended to do rolling repairs, as an attempt to repair the entire cluster at once will most likely swamp it.
•Currently Cassandra isn’t optimized specifically for large file or BLOB storage. However, files of around 64Mb and smaller can be easily stored in the database without splitting them into smaller chunks. This is primarily due to the fact that Cassandra’s public API is based on Thrift, which offers no streaming abilities; any value written or fetched has to fit in to memory. Other non Thrift interfaces may solve this problem in the future, but there are currently no plans to change Thrift’s behavior. When planning applications that require storing BLOBS, you should also consider these attributes of Cassandra as well:
•The main limitation on a column and super column size is that all the data for a single key and column must fit (on disk) on a single machine(node) in the cluster. Because keys alone are used to determine the nodes responsible for replicating their data, the amount of data associated with a single key has this upper bound. This is an inherent limitation of the distribution model.
•When large columns are created and retrieved, that columns data is loaded into RAM which can get resource intensive quickly. Consider, loading 200 rows with columns that store 10Mb image files each into RAM. That small result set would consume about 2Gb of RAM. Clearly as more and more large columns are loaded, RAM would start to get consumed quickly. This can be worked around, but will take some upfront planning and testing to get a workable solution for most applications. You can find more information regarding this behavior here: memtables, and a possible solution in 0.7 here: CASSANDRA-16.
•Please refer to the notes in the Cassandra limitations section for more information: Cassandra Limitations.
•Nodetool relies on JMX, which in turn relies on RMI, which in turn sets up it’s own listeners and connectors as needed on each end of the exchange. Normally all of this happens behind the scenes transparently, but incorrect name resolution for either the host connecting, or the one being connected to, can result in crossed wires and confusing exceptions.
•If you are not using DNS, then make sure that your /etc/hosts files are accurate on both ends. If that fails try passing the -Djava.rmi.server.hostname=$IP option to the JVM at startup (where $IP is the address of the interface you can reach from the remote machine).
• chiton, a GTK data browser.
• cassandra-gui, a Swing data browser.
• Cassandra Cluster Admin, a PHP-based web UI.
• Insert operation throws InvalidRequestException with message ‘A long is exactly 8 bytes’
You are propably using LongType column sorter in your column family. LongType assumes that the numbers stored into column names are exactly 64bit (8 bytes) long and in big endian format.
•As a special case, mutations against a single key are atomic but not isolated. Reads which occur during such a mutation may see part of the write before they see the whole thing. More generally, batch_mutate operations are not atomic. batch_mutate allows grouping operations on many keys into a single call in order to save on the cost of network round-trips. If batch_mutate fails in the middle of its list of mutations, no rollback occurs and the mutations that have already been applied stay applied. The client should typically retry the batch_mutate operation.
There is work being done to support more multi-tenant capabilities such as scheduling and auth. For more information, see MultiTenant.
Check if selinux is on, if it is turn it OFF.
Yes. For details, see ExtensibleAuth.
SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. You can force a GC from jconsole if necessary, but Cassandra will force one itself if it detects that it is low on space. A compaction marker is also added to obsolete sstables so they can be deleted on startup if the server does not perform a GC before being restarted.
•Cassandra uses mmap to do zero-copy reads. That is, we use the operating system’s virtual memory system to map the sstable data files into the Cassandra process’ address space. This will ‘use’ virtual memory; i.e. address space, and will be reported by tools like top accordingly, but on 64 bit systems virtual address space is effectively unlimited so you should not worry about that.
•What matters from the perspective of ‘memory use’ in the sense as it is normally meant, is the amount of data allocated on brk() or mmap’d /dev/zero, which represent real memory used. The key issue is that for a mmap’d file, there is never a need to retain the data resident in physical memory. Thus, whatever you do keep resident in physical memory is essentially just there as a cache, in the same way as normal I/O will cause the kernel page cache to retain data that you read/write.
•The difference between normal I/O and mmap() is that in the mmap() case the memory is actually mapped to the process, thus affecting the virtual size as reported by top. The main argument for using mmap() instead of standard I/O is the fact that reading entails just touching memory – in the case of the memory being resident, you just read it – you don’t even take a page fault (so no overhead in entering the kernel and doing a semi-context switch).
Updating a keyspace first takes a snapshot. This involves creating hardlinks to the existing SSTables, but Java has no native way to create hard links, so it must fork ‘ln’. When forking, there must be as much memory free as the parent process, even though the child isn’t going to use it all. Because Java is a large process, this is problematic. The solution is to install Java Native Access so it can create the hard links itself.
• The set of nodes (a single node, or several) responsible for any given piece of data is determined by:
• The row key (data is partitioned on row key)
• replication factor (decides how many nodes are in the replica set for a given row)
• The replication strategy (decides which nodes are part of said replica set)
• In the case of the SimpleStrategy, replicas are placed on succeeding nodes in the ring. The first node is determined by the partitioner and the row key, and the remainder are placed on succeeding node. In the case of NetworkTopologyStrategy placement is affected by data-center and rack awareness, and the placement will depend on how nodes in different racks or data centers are placed in the ring.
• It is important to understand that Cassandra does not alter the replica set for a given row key based on changing characteristics like current load, which nodes are up or down, or which node your client happens to talk to.
You probably have one or more Column Families with very low throughput. These will typically not be flushed by crossing the throughput or operations thresholds, causing old commit segments to be retained until the memtable_flush_after_min threshold has been crossed. The default value for this threshold is 60 minutes and may be decreased via cassandra-cli by doing:
update column family XXX with memtable_flush_after=YY;
where YY is a number of minutes.
•Seeds are used during startup to discover the cluster
•If you configure your nodes to refer some node as seed, nodes in your ring tend to send Gossip message to seeds more often ( Refer to ArchitectureGossip for details ) than to non-seeds. In other words, seeds are worked as hubs of Gossip network. With seeds, each node can detect status changes of other nodes quickly.
•Seeds are also referred by new nodes on bootstrap to learn other nodes in ring. When you add a new node to ring, you need to specify at least one live seed to contact. Once a node join the ring, it learns about the other nodes, so it doesn’t need seed on subsequent boot.
•Newer versions of cassandra persist the cluster topology making seeds less important then they were in the 0.6.X series, where they were used every startup
•You can make a seed a node at any time. There is nothing special about seed nodes. If you list the node in seed list it is a seed
•Recommended usage of seeds:
•pick two (or more) nodes per data center as seed nodes.
•sync the seed list to all your nodes.
If you are using replicated CF on the ring, only one seed in the ring doesn’t mean single point of failure. The ring can operate or boot without the seed. However, it will need more time to spread status changes of node over the ring. It is recommended to have multiple seeds in production system.
a) Replication takes the same data and copies it over multiple nodes. Sharding puts different data on different nodes
b) Sharding is particularly valuable for performance because it can improve both read and write performance. Using replication, particularly with caching, can greatly improve read performance but does little for applications that have a lot of writes. Sharding provides a way to horizontally scale writes.
The design goal of Cassandra is to handle big data workloads across multiple nodes without any single point of failure.
NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data.
• Document Stores (MongoDB, Couchbase)
• Key-Value Stores (Redis, Volgemort)
• Column Stores (Cassandra)
• Graph Stores (Neo4j, Giraph)
• Apache Hadoop, File Storage, Grid Compute processing via Map Reduce.
• Apache Hive, SQL like interface ontop of hadoop.
• Apache Hbase, Column Family Storage built like BigTable
• Apache Cassandra, Column Family Storage build like BigTable with Dynamo topology and consistency.
Node is the place where data is stored.
Data center is a collection of related nodes.
Cluster is a component that contains one or more data centers.
In this write operations will be handled in the background, asynchronously. It is the fastest way to write data, and the one that is used to offer the least confidence that operations will succeed.
Kundera is an object-relational mapping (ORM) implementation for Cassandra written using Java annotations.
Secondary indexes are indexes built over column values. In other words, let’s say you have a user table, which contains a user’s email. The primary index would be the user ID, so if you wanted to access a particular user’s email, you could look them up by their ID. However, to solve the inverse query given an email, fetch the user ID requires a secondary index.
You want to query on a column that isn’t the primary key and isn’t part of a composite key. The column you want to be querying on has few unique values (what I mean by this is, say you have a column Town, that is a good choice for secondary indexing because lots of people will be form the same town, date of birth however will not be such a good choice).
Try not using secondary indexes on columns contain a high count of unique values and that will produce few results.
A high availability system is the one that is ready to serve any request at any time. High avaliability is usually achieved by adding redundancies. So, if one part fails, the other part of the system can serve the request. To a client, it seems as if everything worked fine.
Cassandra is a robust software. Nodes joining and leaving are automatically taken care of. With proper settings, Cassandra can be made failure resistant. That means that if some of the servers fail, the data loss will be zero. So, you can just deploy Cassandra over cheap commodity hardware or a cloud environment, where hardware or infrastructure failures may occur.
Hector is an open source project written in Java using the MIT license. It was one of the early Cassandra clients and is used in production at Outbrain. It wraps Thrift and offers JMX, connection pooling, and failover.