Automated Deployment with EC2 and Bitbucket

First, I am going to split this into several parts in order to be able to handle the wordpress editing process.

1. I am going to describe how I bring up an ec2 instance with boto3
2. I describe the oauth process to the bitbuckt REST API and the transfer of a deploy-key
3. Bring it all together and wrap it up

So, to start lets create an EC2 instance:


import boto3

def start_ec2_app():

c = get_client()
res = get_resource()

app_sg = create_app_sg(c, 'app')

keypair = c.create_key_pair(KeyName='app_key')
with open('keys/'+keypair['KeyName']+'.pem','w+') as keyfile:
keyfile.write(keypair['KeyMaterial'])

inst = res.create_instances(
ImageId = AMI,
KeyName = 'app_key',
InstanceType = INSTANCE_TYPE,
SecurityGroups = ['app'],
MinCount = 1,
MaxCount = 1
)

# let's wait for the instance
runningWaiter = c.get_waiter("instance_running")
runningWaiter.wait(InstanceIds = [inst[0].id])

instances = res.instances.filter(Filters=[{'Name': 'instance-state-name', 'Values': ['running']}])
i = 0
for instance in instances:
tags = [{"Key" : 'instanceName', "Value" : 'app_%s' %i}]
print(instance.public_ip_address+' '+instance.public_dns_name)
c.create_tags(
Resources = [instance.id],
Tags = tags)

allow_ssh(c, 'app')

def allow_ssh(c, name):
sg = c.describe_security_groups(Filters=[{'Name': 'group-name', 'Values': [name]}])
group = sg['SecurityGroups'][0].get('GroupId')
c.authorize_security_group_ingress(
IpProtocol = "tcp",
CidrIp = "0.0.0.0/0",
FromPort = 22,
ToPort = 22,
GroupId = group)

def get_client():
return boto3.client(
'ec2',
region_name = AWS_REGION,
aws_access_key_id = AWS_ACCESS_KEY,
aws_secret_access_key = AWS_SECRET_KEY,
)

def get_resource():
return boto3.resource(
'ec2',
region_name = AWS_REGION,
aws_access_key_id = AWS_ACCESS_KEY,
aws_secret_access_key = AWS_SECRET_KEY,
)

def create_app_sg(c, name):
sg = c.describe_security_groups(Filters=[{'Name': 'group-name', 'Values': [name]}])
if not sg['SecurityGroups']:
c.create_security_group(
GroupName = name,
Description = '%s Security Group' %name)

start_ec2_app()

Okay, okay… let’s go slowly:
a) we need a client to work (or at least I prefer a client, you could use a resource with some fiddling), so we create one
b) this client now creates a security group that gets the fabulous name ‘app’
c) we run an instance (a single one in this case)
d) we get the id of the instance so we can
e) wait until the instance is running
f) then we can retrieve the public ip and the dns-name
g) lastly we create a pair of keys and save the private key

So, this is done. Off to bitbucket and oauth next…

Advertisements

Perl vs. Python RegEx Shootout

I am constantly told that Perl has much better regex performance than python. When I ask people how they know they answer with “everybody knows that” or “because it’s native” or I am shown some obscure benchmarks whcih seem to test anything but regex performance (hardcoded regex vs interpolated etc.). I wanted to know, and I wanted to fiddle around with performance analysis since I am dealing with Big-O lately. So, without putting an end to the discussion and more as a base for discussions with colleagues and friends here is what I did:

1. I took a large text (Moby Dick at archive.org

2. I wrote a very small programs in perl and python

3. I read in the whole file and measured the time (to be able to see whether one program takes longer to read or not)

4. I ran the code with regex

5. I changed the regex and ran it again

6. I measured with linux’s “time”

I am however not interested in absolute performance (which is machine dependent) but relative.
Version were
perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level
and
Python 2.7.6 (default, Jan 17 2014, 15:43:59) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin

The first two scripts were these

import re;

count = 0
with open('mobydick.txt','r') as f:
data = f.read();


#!/usr/sbin/perl -w
use utf8;
use strict;
use warnings;

my $string;

open FILE, "<", "mobydick.txt";
$string = join("", );
close FILE;

Ran them both and got
python py_regex.py 0,02s user 0,02s system 53% cpu 0,069 total
perl pl_regex.pl 0,01s user 0,02s system 70% cpu 0,047 total

Pretty close. So, I don’t have to concern myself with reading speed in the next measurements.

Then I changed the code to include some regexes. I just counted how many times the word “Pequod” was used.

import re;

count = 0
with open('mobydick.txt','r') as f:
data = f.read();

m = re.findall('(Pequod)', data);

for find in m:
print find
count+=1

print "%d" %count

#!/usr/sbin/perl -w
use utf8;
use strict;
use warnings;

my $count = 0;
my $string;

open FILE, "<", "mobydick.txt";
$string = join("", );
close FILE;

my @m = $string =~ /(Pequod)/g;

foreach(@m){
print "$_\n";
$count++;
}

print $count."\n";

Ran them again and got:

Pequod
[...]
Pequod
66
python py_regex.py 0,02s user 0,01s system 89% cpu 0,033 total

And

Pequod
[...]
Pequod
66
perl pl_regex.pl 0,01s user 0,01s system 89% cpu 0,021 total

Okay, that was a little surprising since in the discussions I had before “outperforms” was a term used quite often.
Maybe it was just that the regex was simply not complex enough or something…

Change the regex and keep everything else.

m = re.findall('(.*Pequod:*)\s', data);

my @m = $string =~ /(.*Pequod.*)\s/g;

And run it again

the Pequod. Devil-Dam, I do not know the origin of ;
[...]
SLOWLY wading through the meadows of brit, the Pequod
66
python py_regex.py 0,07s user 0,01s system 95% cpu 0,082 total

Not too bad an increase.

the Pequod. Devil-Dam, I do not know the origin of ;
[...]
SLOWLY wading through the meadows of brit, the Pequod
66
perl pl_regex.pl 18,16s user 0,09s system 99% cpu 18,347 total

GOODNESS ME!!

I still don’t know what happened, but I will ask around…

Flask migrations with alembic without flask-migrate

The starting point for this setup was, that I wanted to be able to automigrate and to be able to move my models in a folder. And since flask-migrate is using a manager I opted out of that and used plain alembic with flask-sqlalchemy.

So, this is my Flask setup:

app.py
alembic.ini
models/
      shared_model.py
      ... the models ...
alembic/
      env.py
      ... versions/ and rest of alembic ...

So where is the magic happening?
Actually (spoilers!) there is no magic. Maybe I’m just a little daft, for not getting this right sooner. So here is what I do
In my shared_model.py I put my db declaration like this


from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()

and the rest of the code I like to use in all models.

In my app.py I initialize my app via function


from flask import Flask
app = Flask(__name__)

def start_app():
      ... configuration magic ...
      db.init_app(app)
      return app

if __name__ == "__main__":
      app = start_app()
      app.run()

Finally the adjustments in alembic/env.py somewhere underneath the MetaData


from start import start_app
app = start_app()
from models.shared_model import db
db.init_app(app)
config.set_main_option("sqlalchemy.url", app.config["SQLALCHEMY_DATABASE_URI"])
target_metadata = db.metadata

An that’s it. Blueprints, autogenerate and all the fancies 🙂

flask, alembic and blueprints

For some time I could easily do without autogenerated migrations. Now I wanted them and I wanted to use Flask and not Django. I started, very naively, by installing and importing either flask-alembic and flask-migrate but they all seemed (at that time) to support patterns that I didn’t want (e.g. manager, single models.py) or couldn’t understand. At some points I didnt’t get migrations to work at all or they were empty or blueprints wouldn’t work or…

What I wanted was
* a folder “models” containing all models with a file for each model
* plain alembic
* a single start file with my setup and configs

After installing alembic via pip migrations didn’t work and even importing model in env.py didn’t solve it, fiddeling with target_metadata didn’t help as well as several other solutions outlined in StackOverflow. So here is what worked for me:

In my start/setup file (start.py in my case) has a function:


start_app():
app = Flask(__name__)
# config stuff
db.init_app(app)
return app

and


if __name__ == "__main__":
app.start_app()
app.run()

The app is started by just running python start.py without need of a manager.

I created a my shared_model that all model import:


from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()

This makes it easier since all the models just import this shared model and I can also put some other stuff in here that I want to have access to in my models.

The last thing to do is editing the alembic env.py:
1. Import the start_app function and start the app
2. Import the db from the shared model and initialize it
3. Configure and set target_metadata


from start import start_app
app = start_app()
from models.shared_model import db
db.init_app(app)
config.set_main_option("sqlalchemy.url", app.config["SQLALCHEMY_DATABASE_URI"])
target_metadata = db.metadata

That’s about it models go into the models folder and can be used in blueprints, alembic revision –autogenerate produces more than “pass” and the app starts like usual.

Rereading “Implementing Lean Software Development”

After some time I reread Mary and Tom Poppendieck’s book and came to a paragraph that had previously not caught my attention (or I forgot that it had). In my defence it was in a passage about Google (which is not of particular interest to me) underneath a prominent insert, but nonetheless, it struck a nerve with me now, since there are lots of discussion about the Death of Agile for quite a while now:

“This is also the time [feasibility phase] for systems design, a critically important discipline that many companies seem to do without. […] It should neither be done by an amateur nor by arms-length experts. Rather it should be done by seasoned designers who know the systems design will evolve as the product emerges and who know how to make sure that evolution is taken into account so it can proceed smoothly.” (p.47)

So, there you go, one of the major pitfalls. From my own sad experience I can say that system design is the barren wasteland in most companies building software products. They do Scrum, Waterfall, Unified Process, Kanban, XP, Lean, you name it, but on the systems design level decision are still often made with arguments like:

  • [Big company] is using it and they have millions of users, so if it’s good enough for them…
  • Everybody knows that X is the best for Y.
  • You know X is really an expert in [programming language, tool, technique] so we take it.
  • Our old software was running for a long time with X, we should stick with it.
  • X is really around for a long time. At least we would know all the problems and could google them.

I’m sure everybody knows meetings where arguments like this come up (and many more equally sad ones). Normally not from the development team, but the management. But why?
Building a new product is a risky business, you can fail. This is scary. And scared people try to get safety wherever possible, thus forcing a technical solution onto the developers because they can’t give you a guarantee for the projects success and let’s be honest when did the technical gibberish of developers ever beat the argument that f***book or whoever is using it? So the development is left with screws and a hammer and tries bravely to use the impact of the hammer to make the screw rotate, because some other big company is using nails and hammers.

But what to do about this screwed up situation (pun fully intended)?

If you don’t have a seasoned designer on your team it is easy: Hire one, get a consultant, research, discuss
If you have one: Let her/him do his job and don’t try to get guarantees, there are none.

So my feeble take on systems design:

  1. Avoid big enterprise frameworks (unless you are big, as in massive).
  2. Microframeworks are more likely to be able to allow changes due to their pluggable nature.
  3. Define the platform you want to run the software on early and keep deployment in mind.
  4. Plan without css/js frameworks and add them if they provide a significant benefit.
  5. Enforce coding styles and guidelines early on. Otherwise changes and debugging will be a death by a thousand papercuts.
  6. Encourage learning and teamwork to improve code quality and knowledge of the system.
  7. Try your best to keep employees to avoid knowledge drain.
  8. Estimates are not deadlines! Cutting corners to keep a deadline that was an estimate will cost you dearly later.
  9. Automate early!

Hierarchical Structures in Python – i.e. folders

The method I found most appealing in dealing with hierarchical structures is a tree. I think is pretty straightforward and easy to implement and customize.
First we need a class that defines the nodes of the tree.

class Node:
    def __init__(self, name):
        self.children_list = []
        self.name_str = name

    def add_child(self, node):
        self.children_list.append(node)

This is the basic version of the node class. We are good to go. We have the two basic functions to create a new node and to store nodes as children thus building a hierarchy.

So, we build a root node…

root = Node("/")

then we create a second one

first_child = Node("first_child")

and append it to root

root.append(first_child)
    tree.append(node.name_str)

We could go on adding nodes (create node and append to other already existing node as child) to create a tree.

Now lets assume we have a nice tree and we want to return a json representation of the structure.
I (since I like to keep thinks seperated) would make a new file and import the node file.
This new file might look something like this:

def get_tree(node, tree=[]):
    tree.append(node.name_str)
    for child in node.children_list:
        subtree=[]
        tree.append(subtree)
        get_tree(child, subtree)
    return tree

So there you go, the programmers best friend recursion. This would return the tree as nested array containing a name and, if the node has children, an array of children. This structure could then be pickled or returned or stored.
To associate content with the folder there are several options. You could have a list of content objects similar to children_list or ids of content.

Perl vs Python – Regex

tl;dr

Perl does not outperform Python when it comes to regexes. When the term to match is preceded by “.*” the speed drops significantly.

I am constantly told that Perl has much better regex performance than python. When I ask people how they know they answer with “everybody knows that” or “because it’s native” or I am shown some obscure benchmarks whcih seem to test anything but regex performance (hardcoded regex vs interpolated etc.). I wanted to know, and I wanted to fiddle around with performance analysis since I am dealing with Big-O lately. So, without putting an end to the discussion and more as a base for discussions with colleagues and friends here is what I did:

1. I took a large text (Moby Dick at archive.org

2. I tried to wrote very small programs in perl and python

3. I read in the whole file and measure the time (to be able to see whether one program takes longer to read or not)

4. I ran the code with regex

5. I changed the regex and ran them again

6. I measured with linux time

I am however not interrested in absolute performance (which is machine dependent) but relative.
Version were
perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level
and
Python 2.7.6 (default, Jan 17 2014, 15:43:59) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin

The first two scripts were these

import re;

count = 0
with open(‘mobydick.txt’,’r’) as f:
data = f.read();


#!/usr/sbin/perl -w
use utf8;
use strict;
use warnings;

my $string;

open FILE, “<“, “mobydick.txt”;
$string = join(“”, );
close FILE;

Ran them both and got

python py_regex.py 0,02s user 0,02s system 53% cpu 0,069 total

perl pl_regex.pl 0,01s user 0,02s system 70% cpu 0,047 total

Pretty close. So, I don’t have to concern myself with reading speed in the next measurements.

Then I changed the code to include some regexes. I just counted how many times the word “Pequod” was used.

import re;

count = 0
with open(‘mobydick.txt’,’r’) as f:
data = f.read();

m = re.findall(‘(Pequod)’, data);

for find in m:
print find
count+=1

print “%d” %count


#!/usr/sbin/perl -w
use utf8;
use strict;
use warnings;

my $count = 0;
my $string;

open FILE, “<“, “mobydick.txt”;
$string = join(“”, );
close FILE;

my @m = $string =~ /(Pequod)/g;

foreach(@m){
print “$_\n”;
$count++;
}

print $count.”\n”;

Ran them again and got:

Pequod
[...]
Pequod
66
python py_regex.py 0,02s user 0,01s system 89% cpu 0,033 total

And

Pequod
[...]
Pequod
66
perl pl_regex.pl 0,01s user 0,01s system 89% cpu 0,021 total

Okay, that was a little suprising since in the discussions I had before “outperforms” was a term used quite often.
Maybe it was just that the regex was simply not complex enough or something…

Change the regex and keep everything else.

m = re.findall('(.*Pequod.*)\s', data);

my @m = $string =~ /(.*Pequod.*)\s/g;

And run it again

the Pequod. Devil-Dam, I do not know the origin of ;
[...]
SLOWLY wading through the meadows of brit, the Pequod
66
python py_regex.py 0,07s user 0,01s system 95% cpu 0,082 total

Not too bad an increase.

the Pequod. Devil-Dam, I do not know the origin of ;
[...]
SLOWLY wading through the meadows of brit, the Pequod
66
perl pl_regex.pl 18,16s user 0,09s system 99% cpu 18,347 total

GOODNESS ME!!

This drop in speed seems to occur when the matching term is preceded by “.*”.  This might be connected to the lack of variable length look-behind, but that is just me speculating.

But nonetheless I wouldn’t consider Perl as a language for applications dealing with text, as I could never be sure, not to be left with a regex that leads to performance issues in the system.

 

A sip from Flask

Lately I came to find Django a bit top heavy for one of my projects, so I chose Flask as a lighter and smaller alternative.
After fiddling with the tutorials for a bit I wanted to have a setup with several modules. Suprisingly that wasn’t as easy to do as the snippets and examples showed several options and configurations and… So, this is what worked for me. May not be the true gospel but I wanted modules to be set to certain urls like mounted apps in padrino.

This is what I came up with:

    + Project
      -- start.py
      + module1
         -- __init__.py
         -- app.py
      + module2
         -- __init__.py
         -- app.py

So module1 and 2 are two functional units which should answer to specific prefixes (localhost:5000/module1 and localhost:5000/module2) and start.py is the file to run the whole show.

I used flask-blueprint to get it all under the roof.

First let’s get the modules to behave like modules. In module1/app.py I added:

     from flask import Blueprint
     app1 = Blueprint('app1', __name__)
     ...
         @app1.route
     ...

For module2 app.py looks similar except that app1 is changed to app2.

So, now we have the blueprints, of which the project does not know yet. In fact we don’t have any app so far. All the nutrs and bolts go into start.py:

    from flask import Flask
    from module1.app import app1 
    from module2.app import app2 

     project = Flask(__name__)
     project.register_blueprint(app1, url_prefix='/path1')
     project.register_blueprint(app2. url_prefix='/path2')

     if __name__ == '__main__':
         project.run()

This is the beauty of blueprint (imho). Import the blueprint, register it and pu t it on a dedicated path.

Done. To modules in a flask-application.

gitweb – shorty

The Team demanded (or asked nicely) for a graphical overview of all the git repositories, so here is the quick way to do it:

  1. Install gitweb

       sudo apt-get install gitweb
    
  2. Make an empty directory that is the root of all the repositories e.g. pub. This is necessary since git has no concept of a root repository holding others

     mkdir pub/
    
  3. Change owner to the user who owns the repositories

     sudo chown -R git:git pub
    
  4. Now we link the repositories into pub/ (Move to pub/ and do)

     ln -s /path/repo1.git rep1
     ln -s /path/repo2.git rep2
    
  5. Now we open /etc/gitweb.conf and edit the variable to pub/

     $projectroot = "/path/pub"
    

Now http://server/gitweb should show the list of repos. If not you probably have to edit $projectroot in /usr/share/gitweb/gitweb.cgi too.

git hooks – reel in

In the last post I sketched out a simple jabber-notification script for remote git repositories. There are some things, that can be improved there.

First I added an additional argument to exclude the commiter from the message queue. I know that I commited, so I don’t have to be informed about that later (I updated my github repo). So, I have another argument in the call, but what now?
In pushbot.py there is a dict to hold the name of the commiter (or email) as key and the jabberid as a value.

But that in itself is pretty useless, so we have to tweak the hook a little to give the name of the commiter as 2 parameter. This is best achieved in using

git log -1

which gives us the last commit entry. Better stil we can add a formatting instructions like this

git log -1 --pretty=format:"%ce"

which gives us the email-address of the commiting party. I will use this as the key in the pushbot dict holding the jabber-ids to which the push-notification shoudl be sent. I don’t use the commiters name here, beccause of formatting hubub and the fact, that I am less likely to run into problems with doubles.

So, in pushbot.py I will add an email-address as a key

rcps_list={'email@server' :  'jabber@server'}

Now the commiter should not receive any message concerning his now commits. But still we could improve the notification message by using the very same git log statement.

In hooks/post-receive we could generate a more detailed message using

git log -1 --pretty=format:"%cn, %s"

Which gives us the name of the commiter and the subject line. Insert this into the message and you have a nice push notification with sufficient details to decide what you should do without too much overhead.