Indexing with Elasticsearch and Django

So, every decent webapp needs a search feature? Okay, here we go.

All starts with downloading elasticsearch
After extracting start it with

bin/elasticsearch -f

The -f paramter gives you a little output, especially the port and host. By standard this would be localhost:9200.

So let’s get to the Django bit.
First thing to check is whether the model object you want to index for search has one or more foreign key fields.
If so, you might not want to index the ids (it is very unlikely that some user would search for an id).
So what to do? Since data is passed to elasticsearch as a JSON object we will use djangos built in serializer to convert our model object into a JSON object and then pass that on. The serializer provides an option to use something called natural keys, which is called by adding

use_natural_keys = True

to the serializers.serialize(‘json’, modelObject) as a third element. The successfully use this, the model which the foreign key field references has to be extended by a method natural_key.

As an example let’s say, we got to model classes one is product which has a foreign key field manufacturer which references a model of said name:

Manufacturer
    name
    address
    website...

Product
    prod_id
    name
    manufacturer <- there it is, a foreign key to the above
    price...

So if we want to index products for search we may want the manufacturer field to be a name (or a name and address combination etc.). Therefore we define a method “natural_key” in the Manufacturer class i.e.:

def natural_key(self):
  return (self.name)

Thus when serializing a Product the “unsearchable” ID is converted to the manufacturer’s name.

The general idea now is to pass the object as an serialized string to a function that then does the indexing on its own. Doing something ike this:

...
new_product = Product(...)
new_product.save()
myIndexModule.add_to_index(serializers.serialize('json', [new_product], use_natural_keys=True))

So, now to the indexing itself. I use pyelasticsearch for no special reason except that its documentation seemed decent.
The indexer is located in a module since I wanted it to be separated from the rest of the application and it is pretty short.

from pyelasticsearch import ElasticSearch
import json

ES = ElasticSearch('http://localhost:9200')

def add_to_index(string):
    deserialized = json.loads(string)
    for element in deserialized:
        element_id=element["pk"]
        name = element["model"].split('.')[1] <- (this is to get rid of the module prefix but this is just cosmetics)
        index = name + "-index"
        element_type = name
        data = element["fields"]
        ES.index(index, element_type, data, id=element_id)

That’s it. One could certainly do more sophisticated stuff (like plural for the index and singular for the element type and than do something clever about irregular plurals…) but it does the job.

Now let’s use ElasticSearc as a datastore for an application.

But why should we do this. Let’s assume we have an application with a member and a non-member area. Members can do stuff on a database and non-members can not. Since you want to keep the database load from user that do not add anything to your service to a minimum to provide a snappy experience for your members you don’t want them to clog the connection with database requests and decide to let ElasticSearch handle that.
And anyway, it’s just for fun 🙂

So the idea is to make an ajax call to elasticsearch and show a list of the last ten products added to the index to the user. In one of your views for non-members you put a javascript function like this:

$.getJSON('http://localhost:9200/product-index/_search?sort=added&order=asc&from=0&size=10', function(response){....})

and in the function you can now start to play around with the fields like

$.each(response.hits.hits, function(i, item){
     item._source.name
     ...
}

and present them to the users.

Advertisements

Custom authentication in Django

After fiddling with Djangos auth-app for a while I decided t rather have my own (I know, why should one do this? Answer: To learn).
It consists of several steps:

  1. registration
  2. activation
  3. adding a password
  4. login

First I created an app for user-management

 $python manage.py startapp user_management    

This gave me the structure to work with.
First I created the usermodel:

 from django.db import models    
 import bcrypt    

 class User(models.Model):

    email = models.CharField(max_length=100, unique=True)
    firstname = models.CharField(max_length=30)
    lastname = models.CharField(max_length=30)
    password = models.CharField(max_length=128)
    last_login = models.DateTimeField(auto_now=True)
    registered_at = models.DateTimeField(auto_now_add=True)
    core_member = models.BooleanField()
    activation_key = models.CharField(max_length=50, null=True)    

The idea here was to have email as username and to have that unique. I don’t consider usernameshis is a good choice for logins but rather a feature for profiles, but that depends on one’s taste I think.

The registration view is pretty straight forward . I create a RegistrationForm object with fields for email, first and last name.
The activation_key is simply a string of randomly chosen ASCII characters and digits.
Activation itself is just creating a link, sending it and comparing the random part of the link and the stored string. If they match is_active is set to True and the user can set his/her password. For passwords I normally store bcrypt hashes in the database (NEVER! store plaintext passwords in a database!). This is quite simple and can be done by following this description.

The function for setting the password goes into the model. For this to work I use a classmethod. As the name suggests, this is a method bound to the class, not an instance of said class which allows to get objects as in “cls.objects.get()” which is the classmethod’s equivalent to self.something in instance methods.

@classmethod
def set_password(cls, user_id, plain_pass):    
    secret = bcrypt.hashpw(plain_pass, bcrypt.gensalt())
    user = cls.objects.get(pk=user_id)
    user.password = secret
    user.save()
    return True

The login process itself is done via another classmethod which I named authenticate:

@classmethod
def authenticate(cls, email, password, request):
    user = cls.objects.get(email__exact=email)
    if bcrypt.hashpw(password, user.password) == user.password:
        request.session['user_id'] = user.id
        user.save() # this is to get last_login updated
        return user
    else:
        return None

(In order for this to work you have to enable the session middleware and the session app in settings.py.)

So, a quick rundown.

Since I use email as an unique identifier for the login the function expects an email address which is used to find the person to authenticate, the plaintext password (e.g. as given from a inputfield) and the request object to make use of a session. (I use database session handling for development but there are alternatives described in the django docs.)

The bcrypt function returns True if given plaintext password hashed and the stored hash match False if not.

After haveing checkd that the user has given the right credentials I’m going to store the user_id in the session which allows me to get the full set of user information should I need it.

I save the user to trigger the auto_now function of the user model in which updates the last_login field to the actual time.

Now with

User.authenticate(email, password, request) 

the user is logged in.

Datamapper – Padrino – warden

I took a break from coding, but was still looking for a useful set of tools for developing web applications. And I think I found a solution that fits my needs (small core but extensible, modular, reasonable features, usable documentation or active user groups at least).

The goal was to create a backend that would output json objects that could be processed in an independent frontend.

First step was to generate a project following the guide

padrino g project -d datamapper -a mysql -e none

I set renderer (-e option) to none because I am using rabl for templating the json output.
For authentication I chose warden. So I added these to the Gemfile

gem 'warden'
gem 'rabl'

Then turned to the app/app.rb and added

use Warden::Manager do |manager|
manager.default_strategies :password
manager.failure_app = myApp
end

Warden::Manager.serialize_into_session do |user|
user.id
end

Warden::Manager.serialize_from_session do |id|
User.get(id)
end

For creating the model I used the padrino generator again, since the user model is pretty straight forward (extend as needed)

padrino g model User username:string password:string email:string

After setting up config/database.rb you can create the database by using

padrino rake dm:create

To have some entries in the database to work with I costumize the db/seeds.rb which is mentioned in the padrino blog tutorial

Having done this warden should be in the system but is not working yet, since we have to define at least one strategy:

For now I like to use a common username/password login, which is already defined as default in manager.default_strategies. (You could add others if you wanted to, look at the warden-wiki for details)

Warden::Strategies.add(:password) do
def valid?
... code goes here ...
end
def authenticate!
... code goes here ...
? success!(user) : fail!("Invalid")
end
end

So in valid? you would define the requirements that have to be met to go on with the authentication process. In this case checking params[“username”] && params[“password”] would make sense.
After creating a usable authentication! function request to a controller can be authenticated via adding env[‘warden’].authenticate! before the login controller code.
If authentication was successful you can add env[‘warden’].authenticated? to following controllers and get the user (or what you decided to return for success) by calling env[‘warden’].user.

I tested this with curl, since the frontend is intended to be independent. I put the login process in a post route, so
after starting padrino

curl -d "username=...&password=..." localhost:3000/login

gave me the defined output of a successful login.

One pitfall when testing a subsequent controller with curl is that in contrast to a browser you have to add the cookie information. In order to get it you could call

curl -vvv -d "username=...&password=..." localhost:3000/login

and can extract the rack.session=… …; and call the controller with

curl --cookie "rack.session=... ...;" localhost:3000/subsequent_controller

Sinatra, Mustache and Heroku – change of plans

Okay, I really like Mustache, it’s clean, it has a reasonably small set of options (conditionals, lists etc.) and has many implementations. One of them being javascript (Mustache.js).
So, i thought what about using Sinatra to generate JSON responses and serving templates, while the client is responsible for the representation. The benefit being a clean RESTful application and a deferred rendering process allowing to play with different options on caching JSON objects and exploring AJAX features.
And above all a fun project.
So, the basic idea:

  1. the root path “/” reads an index html file and hands it out
  2. navigation elements trigger to asynchronous AJAX requests (one for the template, one for the view)
  3. when both are finished mustache.js kicks in and renders the page (or parts of it)

app.rb

get "/" do
  response = File.open("index.html").read
  "#{response}"
end

index.html

...
var template = null;
var view = null;
var templateFinished = false;
var viewFinished = false;

function getTemplate(template){
  [...] xhRequest [...]
  if(request.readyState == 4){
            template = req.responseText;
            templateFinished = true;
            process();
        }
  }

function getView(view){
  [...] xhRequest [...]
  if(request.readyState == 4){
            view = req.responseText;
            viewFinished = true;
            process();
        }
  }

function process(){
  if(templateFinished == true && viewFinished == true){
	document.getElementById("content").innerHTML = Mustache.to_html(template,view);
    }
    else{
        return;
      }
}

So, in my case the rendered stuff replaces the content in the div with the id “content”. To elaborate on that one could pass the id of the element to be replaced as an option to the function and really play with this.