Writing Long Running Rake Task

Rake task is being used all the time in Ruby on Rails. Whether it is being used to import data from other system, clean up existing data, populate data in development environment for testing purpose, or other uses, chances are some of these rake tasks will take a long time to run. In this post, we will cover best practices to write long running rake task.

Implement Progress Bar

Progress Bar provides callers of rake task with estimates of when they can expect the rake task to be finished. This is handy for long running rake task rather than displaying progress using a bunch of print statements or not displaying progress at all.

To implement progress bar, we found this gem progress_bar serves all of our use cases. Documentation and codes are available at: https://github.com/paul/progress_bar.

To install, just add progress_bar gem to the Gemfile and run bundle install.

# Gemfile
gem 'progress_bar'  

To use progress_bar you need to instantiates it with a number of operations the code is expected to perform and provides format of the progress bar as subsequent arguments. We like to provide :percentage, :bar, :elapsed, :eta, :rate arguments as it provides a nice view of the percentage, bar, elapsed time, ETA, and rate of completion.

# demo.rake
namespace :demo do  
  desc "Demo Progress Bar"
  task progress_bar: :environment do
    bar = ProgressBar.new(100, :percentage, :bar, :elapsed, :eta, :rate)
    1.upto(100) do
      bar.increment!
    end
  end
end  

It gives a nice progress bar in the console like this:

Batch Inserts / Updates

For long running rake task, another thing to pay attention to is memory usage and performance. It might pay off to optimise the code first before running the rake task. From our experience, batch updates / inserts with around 1000 records per updates / inserts provides a nice trade off between memory usage and performance.

For batched inserts, you can store the records in an Array first and use activerecord-import (https://github.com/zdennis/activerecord-import) gem to perform batch inserts. Let's say you want to populate the database with Person records to do performance testing, you can do the following:

# Batch inserts
NUM_RECORDS = 100000  
BATCH_SIZE = 1000

i = 0  
records = []  
loop do  
  if i >= NUM_RECORDS || i % BATCH_SIZE == 0
    # Perform batch inserts
    Person.import(records)
    records = []
  end

  if i >= NUM_RECORDS
    break
  end

  records << Person.new(name: Faker::Name.name())
  i += 1
end  

For batched updates, you can use ActiveRecord class method find_in_batches in conjunction with class method update to perform batch updates. The example below assumes that you just add title attribute to Person and you want to populate this column based on gender:

# Batch updates
Person.find_in_batches do |group|  
  h = {}
  group.each do |person|
    h[person.id] = {
      title: (person.gender == 'MALE') ? 'Mr.' : 'Mrs.'
    }
  end
  Person.update(h.keys, h.values)
end  

We hope that you enjoy programming in Ruby on Rails as we do. If you need help with your project whether it's maintaining existing project, creating MVP from scratch, writing unit tests and integration tests, reviewing existing codes and improving security, reliability & performance, don’t hesitate to contact us.