Git Hook, Line and Sinker

Selfhosting your git repositories is not a bad idea. In fact it is a great idea and it’s pretty simple too.

First you make a new directory on an accessible machine which by convention ends on .git. Something like /allmycode/repo1.git

Move into the directory and execute

 git init --bare --share

Great, we got ourselves a shareable git repository. If you dont’ want ro be the only one to be working on that repository and have no intention of making it public either you should create a user specific for git operations on the machine you serve your repositories from.
Let’s assume your specialized user is called “git”

You can now add ssh-public-keys from all parties that should have access on the repos via copy-ssh-id to /home/git/.ssh/id_rsa.pub and have a nice passwordless access-control.

Now we can start to work on the remote repository.
In you local working directory we

git init

and provide the user information that is used in the commit message

git config --global user.name "Your Name"

git config --gloabl user.email your@email.sth

This was all local, so let’s add the information about remote

git remote add origin git@server:/allmycode/repo1.git

this enables us to make a push to remote with the shorter

git push origin master

It is completely viable to add differently labeled remote repositories e.g.

 git remote add github sth@github

and push a specialised branch (without passwords for example) there via

 git push github public

Nice, self-hosted remote repositories! You can start collaborating. And when you do you, you might want to automate transferring the newest version to a testing server. You could do this with a cronjob and some copying, or, you could use git’s very own hooks, to be specific a post-push hook.
Connect to the remote repository and enter the directory hooks/. Here you find some nice samples, but we want something different. We want a post-receive hook, which means everytime somebody pushes changes to
the remote repository this action is called. So we create that hook:

touch post-receive

then we paste in

#!/bin/sh
GIT_WORK_TREE=/path/to/serverroot/ git checkout -f

and save. Make it executable and you made a git hook. Congrats!
Since we have a user named git who is the owner of all the repos on our remote machine we must add him to the group that controls the webserver paths (www-data or else) Full instructions to make the checkout work.

Now every push to the remote repository should trigger a checkout which hopefully makes the newest version available on the webserver.

But let’s tweak things a little. Say we want to be notified whenever a commit has been pushed. Email and telephone are viable but timeconsuming and you don’t want to, and frankly should not have to, bother. I think Jabber is a great way of getting the information across without spamming the whole team. So I made a little script to send a message to everybody who cares to give me his jabber-id. You can get it here via

git clone https://github.com/kodekitchen/punobo.git

If you add to the post-receive hook

 python /<path-to-repo>/pushbot.py "Something has been pushed."

not only will your testing/demo/development server automatically have been updated, but all listed members of the working group will be informed about it on Jabber.

Advertisements

Business Card with Latex

So, I needed some business cards for a meeting but I rarely ever need more than 8 or so at a time (yes, I’m aware that they look less classy, but having some done would have taken to long) I decided to make some with Latex.
So, here i swhat I have done:

First I declared a Xetex-Preamble to be able to use the fonts from my linux system and don’t have to bother with encoding

 documentclass[a4paper,11pt]{article}
 usepackage[cm-default]{fontspec}
 usepackage{xunicode}
 usepackage{xltxtra}
 usepackage{graphicx} 
 setmainfont[Mapping=tex-text]{Ubuntu}
 setsansfont[Mapping=tex-text]{Ubuntu}
 setmonofont[Mapping=tex-text]{Cantarell}

Next I got rid of all elements that by default come with the article documentclass and redefined width and height of the paper to match an A4 sheet and some other dimensions.

 pagestyle{empty}
 setlength{unitlength}{1mm}
 setlength{paperheight}{297mm}
 setlength{paperwidth}{210mm}
 setlength{oddsidemargin}{-7mm}
 setlength{topmargin}{32mm}
 setlength{textheight}{280mm}

After that I declared all text elements that should be on the card.

 newcommand{bcname}{Caspar David Dzikus}
 newcommand{bctitleA}{KodeKitchen Writer}
 newcommand{bctitleB}{}
 newcommand{bccontactA}{555-555-5555}
 newcommand{bccontactB}{caspar@kodekitchen.com}
 newcommand{bccontactC}{http://kodekitchen.com}
 newcommand{bcsub}{coding and stuff}

The document itself is pretty straightforward: The card itself is a picture which is then repeated ten times (five rows, two columns) in another picture. To help cut the cards marks are placed in the corner of each card (which is 80 x50mm)

 begin{document}
 begin{picture}(170,209)(0,0)
 multiput(0,0)(0,50){5}{
    multiput(0,0)(80,0){2}{
       begin{picture}(80,50)(0,0)
       % marks for cutting
       put(-1,0){line(1,0){2}}
       put(0,49){line(0,1){2}}
       put(-1,50){line(1,0){2}}
       put(0,-1){line(0,1){2}}
       put(80,49){line(0,1){2}}
       put(80,-1){line(0,1){2}}
       put(79,0){line(1,0){2}}
       put(79,50){line(1,0){2}}

      put(13,39.5){textsf{LARGEbcname}}
      put(13,34){textsf{scriptsizebctitleA}}
      put(13,31){textsf{scriptsizebctitleB}}
      put(13,24){tt{normalsizebccontactA}}
      put(13,19){tt{normalsizebccontactB}}
      put(13,14){tt{normalsizebccontactC}}
      put(55,8){textsf{scriptsizebcsub}}

     end{picture}
     }
 }
 end{picture}

end{document}

And this is what you get

Indexing with Elasticsearch and Django

So, every decent webapp needs a search feature? Okay, here we go.

All starts with downloading elasticsearch
After extracting start it with

bin/elasticsearch -f

The -f paramter gives you a little output, especially the port and host. By standard this would be localhost:9200.

So let’s get to the Django bit.
First thing to check is whether the model object you want to index for search has one or more foreign key fields.
If so, you might not want to index the ids (it is very unlikely that some user would search for an id).
So what to do? Since data is passed to elasticsearch as a JSON object we will use djangos built in serializer to convert our model object into a JSON object and then pass that on. The serializer provides an option to use something called natural keys, which is called by adding

use_natural_keys = True

to the serializers.serialize(‘json’, modelObject) as a third element. The successfully use this, the model which the foreign key field references has to be extended by a method natural_key.

As an example let’s say, we got to model classes one is product which has a foreign key field manufacturer which references a model of said name:

Manufacturer
    name
    address
    website...

Product
    prod_id
    name
    manufacturer &lt;- there it is, a foreign key to the above
    price...

So if we want to index products for search we may want the manufacturer field to be a name (or a name and address combination etc.). Therefore we define a method “natural_key” in the Manufacturer class i.e.:

def natural_key(self):
  return (self.name)

Thus when serializing a Product the “unsearchable” ID is converted to the manufacturer’s name.

The general idea now is to pass the object as an serialized string to a function that then does the indexing on its own. Doing something ike this:

...
new_product = Product(...)
new_product.save()
myIndexModule.add_to_index(serializers.serialize('json', [new_product], use_natural_keys=True))

So, now to the indexing itself. I use pyelasticsearch for no special reason except that its documentation seemed decent.
The indexer is located in a module since I wanted it to be separated from the rest of the application and it is pretty short.

from pyelasticsearch import ElasticSearch
import json

ES = ElasticSearch('http://localhost:9200')

def add_to_index(string):
    deserialized = json.loads(string)
    for element in deserialized:
        element_id=element["pk"]
        name = element["model"].split('.')[1] <- (this is to get rid of the module prefix but this is just cosmetics)
        index = name + "-index"
        element_type = name
        data = element["fields"]
        ES.index(index, element_type, data, id=element_id)

That’s it. One could certainly do more sophisticated stuff (like plural for the index and singular for the element type and than do something clever about irregular plurals…) but it does the job.

Now let’s use ElasticSearc as a datastore for an application.

But why should we do this. Let’s assume we have an application with a member and a non-member area. Members can do stuff on a database and non-members can not. Since you want to keep the database load from user that do not add anything to your service to a minimum to provide a snappy experience for your members you don’t want them to clog the connection with database requests and decide to let ElasticSearch handle that.
And anyway, it’s just for fun 🙂

So the idea is to make an ajax call to elasticsearch and show a list of the last ten products added to the index to the user. In one of your views for non-members you put a javascript function like this:

$.getJSON('http://localhost:9200/product-index/_search?sort=added&order=asc&from=0&size=10', function(response){....})

and in the function you can now start to play around with the fields like

$.each(response.hits.hits, function(i, item){
     item._source.name
     ...
}

and present them to the users.

Custom authentication in Django

After fiddling with Djangos auth-app for a while I decided t rather have my own (I know, why should one do this? Answer: To learn).
It consists of several steps:

  1. registration
  2. activation
  3. adding a password
  4. login

First I created an app for user-management

 $python manage.py startapp user_management    

This gave me the structure to work with.
First I created the usermodel:

 from django.db import models    
 import bcrypt    

 class User(models.Model):

    email = models.CharField(max_length=100, unique=True)
    firstname = models.CharField(max_length=30)
    lastname = models.CharField(max_length=30)
    password = models.CharField(max_length=128)
    last_login = models.DateTimeField(auto_now=True)
    registered_at = models.DateTimeField(auto_now_add=True)
    core_member = models.BooleanField()
    activation_key = models.CharField(max_length=50, null=True)    

The idea here was to have email as username and to have that unique. I don’t consider usernameshis is a good choice for logins but rather a feature for profiles, but that depends on one’s taste I think.

The registration view is pretty straight forward . I create a RegistrationForm object with fields for email, first and last name.
The activation_key is simply a string of randomly chosen ASCII characters and digits.
Activation itself is just creating a link, sending it and comparing the random part of the link and the stored string. If they match is_active is set to True and the user can set his/her password. For passwords I normally store bcrypt hashes in the database (NEVER! store plaintext passwords in a database!). This is quite simple and can be done by following this description.

The function for setting the password goes into the model. For this to work I use a classmethod. As the name suggests, this is a method bound to the class, not an instance of said class which allows to get objects as in “cls.objects.get()” which is the classmethod’s equivalent to self.something in instance methods.

@classmethod
def set_password(cls, user_id, plain_pass):    
    secret = bcrypt.hashpw(plain_pass, bcrypt.gensalt())
    user = cls.objects.get(pk=user_id)
    user.password = secret
    user.save()
    return True

The login process itself is done via another classmethod which I named authenticate:

@classmethod
def authenticate(cls, email, password, request):
    user = cls.objects.get(email__exact=email)
    if bcrypt.hashpw(password, user.password) == user.password:
        request.session['user_id'] = user.id
        user.save() # this is to get last_login updated
        return user
    else:
        return None

(In order for this to work you have to enable the session middleware and the session app in settings.py.)

So, a quick rundown.

Since I use email as an unique identifier for the login the function expects an email address which is used to find the person to authenticate, the plaintext password (e.g. as given from a inputfield) and the request object to make use of a session. (I use database session handling for development but there are alternatives described in the django docs.)

The bcrypt function returns True if given plaintext password hashed and the stored hash match False if not.

After haveing checkd that the user has given the right credentials I’m going to store the user_id in the session which allows me to get the full set of user information should I need it.

I save the user to trigger the auto_now function of the user model in which updates the last_login field to the actual time.

Now with

User.authenticate(email, password, request) 

the user is logged in.

http://popplers5.bandcamp.com/download/track?enc=mp3-128&fsig=127357a657b5b6166a4ba0506e1c9cd0&id=55414696&nl=1&stream=1&ts=1414051107.0?plead=please-dont-download-this-or-our-lawyers-wont-let-us-host-audio
http://bandcamp.com/EmbeddedPlayer/size=medium/bgcol=ffffff/linkcol=0687f5/notracklist=true/transparent=true/album=1901731418/

Lots of code on here was written with the music of this artist 🙂

Listen/purchase: Into The Trees by Zoe Keating

Setting up my own flavour of Django

Okay, so I started doing stuff in python and of course stated playing around with django. And beeing used to padrinorb‘s convenient generators, I had to figure out how to get to my preferred setup. This is what I do:

  1. Run

    django-admin.py startproject projectname

  2. in settings.py
    add

    import os.path
    and add

    os.path.join(os.path.dirname(__file__), ('templates'))
    to TEMPLATE_DIRS

  3. Make dir templates/ in the project folder
  4. Make dir views/ in the project folder
  5. Add an __init__.py file
  6. import your views in __init__ (e.g.

    from index import hello
    if you have a view file called index.py containing a function hello())

  7. In templates I put subdirs for all sites and a base.html which holds the frame for all sites.
  8. Now in urls.py import all views via

    from views import *

So, this gives me a view and a template dir as well as a frame for the sites.

Now that I got the views and template going I would like to have a seperate dir for static contents. Django’s static dir is simply /static whih is fine by me, but making a directory named static and putting stuff in won’t do. You have to put


STATICFILES_DIRS = (os.path.join(os.path.dirname(__file__), 'static/'),)

After putting


{% load staticfiles %}

into the base.html. You can insert static files like css, image and so on by putting


{% static foo/bar.ext %}

into the template tag.

Legacy code and the “SuperProgrammer”

I started an online python course some days ago and part of the assignment is to peer evaluate other peoples code. The task was to print a message on the screen. Yes, I know, a boring task.
There I came upon something like this:

string = "xdxlxrxoxW xoxlxlxexH"
string = string[::-2]
print string

And this, in three lines, is the essence of problems I’ve encountered over the years with big complex projects and legacy code. Remarkably it seems to be a trap each projects “Super-Programmer” falls in…

1. Show-off programming
It’s okay to be proud of ones knowledge but, come on, this is about the job, not your ego.

2. The code is the documentation
NO, definitely not, code is just a small part of any bigger or more complex project. There ususally are configuration, directory structures, external dependencies (libraries) etc. Put it somewhere to be seen, the init file, a readme, a getting-started txt file but don’t assume.

3. Don’t oversmart
You found this very cool, super cryptic looking function that does unexpected thing… Yeah, probably use something that can be understood right away or at least leave a comment about what it does.

4. Modularize to death
Especially in ruby (but any other language as well) I found many people building modules around simple functions, meta programming things to bits and doing stuff they found in years old posts somewhere.
Those techniques are all good and useful at times but not every function is predestined to be reused in another project, so why not declare it a helper function?

In short:
1. Write code that can be read with by an average coder, not just by the “Super-Programmer”. Projects or companies dev teams seldom have an even knowledge distribution. (And in most cases you don’t even want that.)
2. Documentation!
3. Comments!
4. Put your ego aside. I rarely think stuff like “Oh my, he/she came up with a fancy solution”, mostly it is along the line of “WTF! Why didn’t he use the obvious solution?” So if there is a reason for doing it differently go back to point 2 or 3.

Ruby Shortcuts

I sometimes stumble upon little snippets of ruby code that can replace longer loops. Like this one:

array.reject!(&:empty?)

which eliminates all empty strings from an array of strings (does not work for Fixnum), which is quite handy following a “split” operation.

Another one is this:

array.inject(:+)

which sums up all elements in an array and also works for strings.

Automating virtualization with veewee and vagrant

In order to consolidate my development environment a friend (thank you @bascht) mentioned veewee to me. Veewee automates and simplifys vagrant basebox creation. It comes with a plethora of templates for operating systems and versions. I use it to quickly build Ubuntu server on VirtualBox.

I started with a standard template (vagrant basebox define ‘appbox’ ‘ubuntu-12.10-server-amd64’) and customized the postinstall.sh. Including replacing the ruby install with rvm. This leads to problems when using chef which I don’t necessarily use so I can live with the inconvenience rather than with having to ruby setups.

I keep my own definitions for veewee in a separate directory to be able to version control it. Basebox creation itself is done by a little script. This setup requires git, and VirtualBox to be installed. I clone veewee parallel to the directory holding my definitions.


cd ..
BASE=$(pwd)

echo “nnBuilding dbesbox and exportingnn”
if [ ! -d $BASE/veewee/definitions ]; then
mkdir $BASE/veewee/definitions/
fi
cp -r $BASE/my-devenv/appbox/ $BASE/veewee/definitions/appbox/
cp -r $BASE/my-devenv/databasebox/ $BASE/veewee/definitions/databasebox/
cd $BASE/veewee/
veewee vbox build ‘appbox’ —force
veewee vbox export ‘appbox’ —force
echo “nnDone building and exporting appboxnn”
mv $BASE/veewee/appbox.box $BASE/my-devenv/

veewee vbox build ‘databasebox’ —force
veewee vbox export ‘databasebox’ —force
echo “nnDone building and exporting databaseboxnn”
mv $BASE/veewee/databasebox.box $BASE/my-devenv/

cd $BASE/my-devenv/
veewee alias delete vagrant #Eradicate the “Gemfile could not be found” error
source ~/.zshrc

After building the two boxes I add them with

vagrant box add 'appbox' 'appbox.box'

and databasebox accordingly.

My Vagrantfile looks like this:

Vagrant::Config.run do |config|

config.vm.define :app do |app_config|
app_config.vm.box = "appbox"
app_config.vm.network :hostonly, "10.10.1.2" #10.10.1.1 is the host
app_config.vm.provision :shell, :path => "provisions/common_base.sh"
app_config.vm.provision :shell, :path => "provisions/app_base.sh"
end

config.vm.define :db do |db_config|
db_config.vm.box = “databasebox”
db_config.vm.network :hostonly, “10.10.1.3”
db_config.vm.provision :shell, :path => “provisions/common_base.sh”
db_config.vm.provision :shell, :path => “provisions/db_base.sh”
end

end

My provisioner is shell, but a »gem install chef« would enable you to use chef subsequently to provision. (Since rvm installation requires a closing and restarting of the shell or login/logout I found this to be more reliable.)

Now a simple ‘vagrant up’ starts both boxes and pointing the browser at 10.10.1.2 is answered by the nginx in appbox. My application is configured to use the db server at 10.10.1.3