Cassandra: What it is and what not

Recently I had a chance to work on the Cassandra. To explain the need in short, it was required to have a distributed key-value store. While Redis is great but it doesn’t let you have multiple geographically distributed writable servers but Cassandra does. Writing here few points about Cassandra and so that one can keep them in the back of the head while setting it up.

  • Contrary to what I said, Cassandra is not exactly key-value storage. It is more of a JSON format storage which can behave like key-value pair.
  • CQL – Cassandra Query Language distracts the user from the true nature of Cassandra. It makes one believe that Cassandra is like RDBMS while its not. One must think of it as key-value, where values can be extended further.
  • Data structures provided by Redis can be implemented easily.
  • Throughput are depended on configuration of machine. More the number of CPU, better the throughput can be. Go through other configurations as well, best practices are explained in configuration file.
  • Data is first written in memory(memtables) and on disc(sstables) during compaction, which depends on the settings. By default it is set to 10 days or Java heap size in memory whichever is reached first.
  • Lesser the tombstones, faster will be the compaction. This way, probability of reading sstables, which is disk read, go low.
  • Updates are waste of resource use insert instead, it will overwrite the existing row/document on compaction.
  • To keep the read fast use quorum as 1, that is whatever you are getting on selection is the truth.
  • Partitioning key should be designed as such to keep the data related to it on a single node. IMO partitioning key is analogous to tables in RDBMS. This way for a particular key and quorum equals to 1 will always be truth. See composite partitioning keys.
  • Latency is a thing to worry about. Try to keep reads as low as possible.
  • Schema structure is very important. It is important to learn about the queries one needs to make on the table before writing schema. Unlike RDBMS, condition has to be on successive columns instead of random columns.
  • If using PHP, use Java-client with php-java bridge instead of native PDO driver. It provides almost 3x read/write throughput per node.
  • IRC is good place to get help in case of issues.
  • If nodes are EC2 instances, snitch configured for EC2 is available for used.
  • Version 2.0.9 is not compatible with 2.1.x in a cluster.
  • In general, version below x.x.5 are not production ready and have serious bugs. Current 2.1.1 is not suitable for production environment.
Advertisements

PDO: PHP Data Object

There are different ways of connecting to database in PHP. The most general way in use is MySQL extension, which we have discussed here and even everywhere. Well there are two other ways of connecting to database. First one is MySQLi and other is PDO.

Now, question is what is the difference? So, the difference is MySQLi is MySQL improved. As name states, it is improved version of the MySQL extension. It uses both direct function which we use in MySQL extension and the object oriented way of implementing the same. So if I would be using MySQLi, I would choose OOP way. But MySQLi extension is only for MySQL. It is more secured if we use it in OOP way like it eliminates the threat of sql injection.

PDO is more generic because through it you can connect to different type of database by changing just the name of the database you want to connect. You can connect to MySQL, MS SQL, PostgreSQL, SQLite and many more. So it is more generalized, as you will not be needed to change the function call and other things. All you have to do is change the connector name, this is the term which is used for software that connects application to database. It is also based on the OOPs the concepts, which makes it easy to use.

PDO and MySQLi have very much the same way of implementation. Moreover MySQL extension is being maintained and can be deprecated in near future. While the other two are in active development.

Next time we’ll be learning implementation using PDO. Its very easy, meanwhile if you want to look it by yourself, here are the links. PDO and MySQLi.