Cassandra: What it is and what not

Recently I had a chance to work on the Cassandra. To explain the need in short, it was required to have a distributed key-value store. While Redis is great but it doesn’t let you have multiple geographically distributed writable servers but Cassandra does. Writing here few points about Cassandra and so that one can keep them in the back of the head while setting it up.

  • Contrary to what I said, Cassandra is not exactly key-value storage. It is more of a JSON format storage which can behave like key-value pair.
  • CQL – Cassandra Query Language distracts the user from the true nature of Cassandra. It makes one believe that Cassandra is like RDBMS while its not. One must think of it as key-value, where values can be extended further.
  • Data structures provided by Redis can be implemented easily.
  • Throughput are depended on configuration of machine. More the number of CPU, better the throughput can be. Go through other configurations as well, best practices are explained in configuration file.
  • Data is first written in memory(memtables) and on disc(sstables) during compaction, which depends on the settings. By default it is set to 10 days or Java heap size in memory whichever is reached first.
  • Lesser the tombstones, faster will be the compaction. This way, probability of reading sstables, which is disk read, go low.
  • Updates are waste of resource use insert instead, it will overwrite the existing row/document on compaction.
  • To keep the read fast use quorum as 1, that is whatever you are getting on selection is the truth.
  • Partitioning key should be designed as such to keep the data related to it on a single node. IMO partitioning key is analogous to tables in RDBMS. This way for a particular key and quorum equals to 1 will always be truth. See composite partitioning keys.
  • Latency is a thing to worry about. Try to keep reads as low as possible.
  • Schema structure is very important. It is important to learn about the queries one needs to make on the table before writing schema. Unlike RDBMS, condition has to be on successive columns instead of random columns.
  • If using PHP, use Java-client with php-java bridge instead of native PDO driver. It provides almost 3x read/write throughput per node.
  • IRC is good place to get help in case of issues.
  • If nodes are EC2 instances, snitch configured for EC2 is available for used.
  • Version 2.0.9 is not compatible with 2.1.x in a cluster.
  • In general, version below x.x.5 are not production ready and have serious bugs. Current 2.1.1 is not suitable for production environment.

JSON – Javascript Object Notation

There were days when we used to create and use XML for transporting and store data and it was good at it. But I consider there was a problem with it getting data from it, I feel nausea. I used ‘was’ here because I also consider it as the way our daddies used to transfer data. Now is the day of JSON.

JSON is cool, easy, requires less memory(though we are not bothered). You can use it with any language and I love it. Frankly, I gave up XML before learning it because of JSON.

JSON looks like array we make in C\C++ or any other language. As we are familiar with the array, it makes understanding of JSON easy. It looks like :

{'data' : ['J','S','O','N'], 'this' : { 'is' : 'json' } };

Let’s use it with PHP and Javascript.

PHP:

$array = new array('data'=>array('J','S','O','N'), 'this'=>'json');
$json = json_encode($array);
$json = json_decode($json);
echo $json->data[0];
//output: J
echo $json->this;
//output: json

Javascript:

var json = {'data' : ['J','S','O','N'], 'this' : { 'is' : 'json' } };
alert(json['data'][0]);
//output: J
alert(json->data[0])
//output: J
alert(json->this);
//output: json
alert(json['this'])
//output: json

We have studying here for PHP and Javscript as it is used mostly for web development. Go to json.org for better understanding and using with other languages.

Jquery post method for JSON in php files

jQuery is the awesomest thing I have came across so far. It makes things so easy one can easily understand why upon seeing example below. “post” method of jQuery is frequently used by many sites even Facebook and Google. You must have seen that log-in to Gmail doesn’t redirect/reload and tells you that username or password is incorrect instead they post the values using AJAX or lets say using post method of jQuery and display the error response.

Required: jquery.js(I use jquery-1.6.js. Download latest!)

Using:

file1.php


$(document).ready(function(){
$.post( 'filename', {data : data}, function(data , status){
if(status == 'success'){
alert(data.name); //your code }
}, "json" );
});

In file2.php


</div>
<div>if(isset($_POST['data'])){
echo json_encode(array('name'=>$_POST['data']));
}</div>
<div>

Explanation:

file1.php

  1. checks if the current document is ready.
  2. is the jquery method “post” is given arguments, first is the “filename” is the file name which you want to handle the post variables. Second is associative array sort of thing in which first “data” is name of the post variable and second “data” is the value of the variable. Third argument is the callback function, this executes when the post is complete. This function is provided two arguments, first is “data” which is the response from the file which is handling post. We will explain it after sometime. And the second is “status” it is optional in all of this without this code will still work. It actually holds the status of response.
  3. checks if status is success
  4. if status is success we use alert to check for the JSON value which is data.name and will alert “data” because we sent it back from file2.php.
  5. “json” is the last argument of the post method which tells the that this method is expecting JSON as response.

Note:

  • post variable name can be in quotes ‘data’.
  •  argument status can be omit.
  • “json” is required as argument if you want response to be JSON.

 

file2.php

  1. checks if $_POST[‘data’] is set. The data index is the same variable name we have sent from file1.php.
  2. first we create an associative array with key as “name” and value as $_POST[‘data’]. Then we encode it in JSON using json_encode and echo it, which is accepted by first file in callback function as data and checked using alert in file1.php as data.name which is the key in associative array.

Note:

  • This file should not have any of HTML code.
  • This file should not echo anything other than JSON.
  • You can use include() but above two points should always hold.
Hope this will help you in case you find trouble. Please also go through the jQuery website, they have pretty good tutorials.