Elastic Mapreduce streaming job with elasticity

Requires elasticity (https://github.com/rslifka/elasticity) and a Registration with Amazon AWS but works like a charm 🙂

This mainly does the following: Make a new bucket for every day the script runs, do the map-reduce job, get the result.

@new_bucket = "run-" + Time.now.strftime("%Y%m%d")
@new_job = "job-" + Time.now.strftime("%Y%m%d")
# Create a new result bucket in results
newdir = connection.directories.create(
  :key    => @new_bucket,
  :public => false
)
puts "Results are thrown into bucket" + newdir.key

emr = Elasticity::EMR.new(@key_id,@secret_key)
jobflow_id = emr.run_job_flow({
    :name => @new_job,
    :instances => {
      :ec2_key_name => "test",
      :hadoop_version => "0.20",
      :instance_count => 2,
      :master_instance_type => "m1.small",
      :placement => {
        :availability_zone => "us-east-1a"
      },
      :slave_instance_type => "m1.small",
    },
    :steps => [
      {
        :action_on_failure => "TERMINATE_JOB_FLOW",
        :hadoop_jar_step => {
          :args => [
            "-input",   "s3n://input/",
            "-output",  "s3n://" + @new_bucket + "/",
            "-mapper",  "s3n:/mapper/mapper1.rb",
            "-reducer", "s3n://reducer/reducer1.rb",
          ],
          :jar => "/home/hadoop/contrib/streaming/hadoop-streaming.jar"
        },
        :name => "mr1"
      }
    ]
  })

puts jobflow_id + " started"

jobflows = emr.describe_jobflows
state = jobflows[0].state
puts jobflows[0].name + " " + state + "nn"

if state == 'COMPLETED'
  result = storage.directories.get(@new_bucket).files.get("part-00000").body
  result.each_line do |line|
    puts line[0]
  end
end

Errata:

The last if-statement make no sense, as long as we don’t add a routine to check for changes in the state of the job…

while(state != 'COMPLETED' && state != 'FAILED')
  jobflows = emr.describe_jobflows
  state = jobflows[0].state
  puts jobflows[0].name + " " + state + " (" + Time.now.strftime("%H:%M:%S") + ")n"
  sleep 300
end
Advertisements