Creating a tumblelog with Flask and Flask-CQLAlchemy

17 Jun 2015


Flask-CQLAlchemy is a flask extension that acts as bridge between the Cassandra python driver’s cqlengine ORM and Flask.

Prerequisites

This tutorial assumes that you are familiar with Flask and Cassandra and have them installed and running. It is highly recommended that you have at least two Cassandra instances in your cluster(ccm is a very good option). We will also be using virtualenvwrapper and pip.

Packages

We will be installing the following extensions for Flask to help create this tumblelog

  • Flask-CQLAlchemy - to provide integration with cqlengine
  • Flask-Script - for an easy-to-use developement server

To install run this command inside the virtual environment

(tumblelog)$ pip install flask-cqlalchemy flask-script

Creating the application

We will create a simple blog to get started. First create a project directory called tumblelog-project. In the tumblelog-project directory create a directory called tumblelog. This directory will contain the app itself. Inside the tumblelog folder create a file called __init__.py.

(tumblelog)$ mkdir tumblelog-project tumblelog-project/tumblelog
(tumblelog)$ cd tumblelog-project/tumblelog

Your folder structure should look something like this:

tumblelog-project
|
 \
  tumblelog
  |__init__.py

In the __init__.py file add the following lines:

from flask import Flask
app = Flask(__name__)


if __name__ == '__main__':
    app.run()

Now in the tumblelog-project folder create a file called manage.py and add these lines:

# application imports
from flask.ext.script import Manager, Server
from tumblelog import app


manager = Manager(app)

manager.add_command("runserver", Server(
    use_debugger=True,
    use_reloader=True,
    host='0.0.0.0')
)

if __name__ == '__main__':
    manager.run()

Now you can run the server with the following command:

$ python manage.py runserver

Since we don’t have any views defined yet, we won’t be able to see anything if we access the page at http://localhost:5000

App Configuration

Now we need to add the CQLAlchemy configuration values and bind CQLAlchemy to the app. Create a file called config.py in the tumblelog folder. In the config file add the configuration values:

# Database options
CASSANDRA_HOSTS = ['127.0.0.1']
CASSANDRA_KEYSPACE = "tumblelog"

Add the following lines to the __init__.py file.

from config import CASSANDRA_HOSTS, CASSANDRA_KEYSPACE

app.config['CASSANDRA_HOSTS'] = CASSANDRA_HOSTS
app.config['CASSANDRA_KEYSPACE'] = CASSANDRA_KEYSPACE

db = CQLAlchemy(app)

Defining models

Now let us define the models for the app. Create a file called models.py file inside the folder tumblelog. In the models file add the following:

import datetime
import uuid
from flask import url_for
from tumblelog import db


class Post(db.Model):
    slug = db.columns.Text(primary_key=True, max_length=255, required=True)
    created_at = db.columns.DateTime(primary_key=True,
                                     default=datetime.datetime.now,
                                     required=True)
    title = db.columns.Text(max_length=255, required=True)
    body = db.columns.Text(required=True)

    def get_absolute_url(self):
        return url_for('post', kwargs={"slug": self.slug})


class Comment(db.Model):
    slug = db.columns.Text(primary_key=True, required=True)
    created_at = db.columns.DateTime(primary_key=True,
                                     default=datetime.datetime.now,
                                     required=True)
    body = db.columns.Text(required=True)
    author = db.columns.Text(max_length=255, required=True)


class CommentCount(db.Model):
    slug = db.columns.Text(primary_key=True, required=True)
    comments = db.columns.Counter()

Modeling in Cassandra

Modeling in Cassandra is a little tricky if you come from a RDBMS background. Some major points to be noted in Cassandra is:

  1. It has no joins
  2. Writes are cheap, reads are not
  3. Filtering can be done only on primary keys or on indexed columns
  4. Indexes have a high cost on performance

This in effect forces us to throw away all RDBMS design principles and follow some new principles. The main one being that duplication is good. In fact, it is recommended that indexes be religiously avoided, and to create separate tables with the necessary columns set as primary keys to be queried (Remember writes are cheap, reads are not).

Modeling methodology

In out tumblelog, our post would have:

  1. slug - which defines the user-friendly url of the blog. The slug will be unique and also be used as a filter when querying.
  2. created_at - The time at which the post was published
  3. title - The title of the post
  4. body - The body of the post

Since the slug is unique and to be used as a filter, that would naturally be the primary key. Now ideally in a list of posts we would want the post to be ordered by the time it was published. Hence created_at would be the clustering key. Clustering keys are primary keys that are not partition keys. Hence unless explicitly specified in the model, all primary keys except the first one would become clustering keys.

A comment would also follow a similar model. However when listing posts we would like to know the comments(or likes) a post has. Cassandra has a counter datatype which has some caveats though. Counters must reside in a separate table and must not contain any data except for the primary key.

This means having a new model(and effectively a table) for comment counts. This table would have only two columns, the slug of the parent post and the number of comments for that post.

Adding Data

Before we add views, we can check if our model holds up and works according to Cassandra and cqlengine rules. Run this command:

$ python manage.py shell

You can add the first post like this:

>> from tumblelog import db
>> from tumblelog.models import Post, Comment, CommentCount
>> db.create_all()
>> post = Post(
...    slug="hello-world",
...    title="Hello World!",
...    body="This is my first post using Flask and Flask-CQLAlchemy"
...)
>> post.save()

You can add a comment to this post like this:

>> comment = Comment(
...    slug="hello-world",
...    body="This is my first comment",
...    author="John Doe"
...)
>> comment.save()
>> count = CommentCount(
...    slug="hello-world",
...)
>> count.comments = count.comments+1
>> count.save()
Previous Post Next Post