Development

Batch jobs with Google Cloud Tasks and Rails

Arnaud Lachaume
Arnaud LachaumeLink to author's LinkedIn profile
Calendar
June 8, 2021
icon timer
4
min

Implement sophisticated background task flows with batch jobs.

Table of Content

TL;DR; The Cloudtasker gem offers native job batching functionalities. You can read more about it in the Cloudtasker batch documentation.

Job batching is the ability to group a series of small job into one big job. This is particularly useful when you must take actions (e.g. update an ActiveRecord model status) after a number of related jobs have completed.

For example let's say your platform offer users the ability to import data from a third-party application. You will want to enqueue multiple jobs to fetch data via API but only flag the import as done once all the jobs have completed.

Cloudtasker offers native job batching functionalities. So if you're a user of Google Cloud and need batch jobs, feel free to try the gem!

If you haven't setup Cloudtasker in your project, have a look at our previous blog post introducing cloudtasker or the gem documentation.

Getting started

What we will do

We'll take the example above of importing data from a third-party application. We'll assume you have an ActiveRecord model called ProjectImport with two fields:

  • status: Initially set to pending. Moves to importing then to completed.
  • progress: An integer field capturing the % of progress for the import.

In order to import data, we'll assume you need to call four different endpoints on a fictive API. Each endpoint may have thousands of results so you want to import each endpoint in parallel.

Once each endpoint is fully imported, you want to flag the ProjectImport#status field as completed.‍

Let's get started.

Cloudtasker setup

In order to activate the batching functionalities you will need to modify your cloudtasker initializer as follows:

Implementing the batch import jobs

The data fetching jobs

The jobs doing the actual API fetching are regular cloudtasker jobs. There is nothing related to batching in these jobs.

Here is what the worker would look like:

The batch import job

This is the interesting part. Let's implement the main job that will stitch everything together and update the ProjectImport model when the actual data import is done.

Adding jobs to a batch job is like enqueuing them, except that we will do so through a specific batch method.

The batch job looks like this:

Note the on_batch_complete hook. Batch jobs have various hooks allowing you to take actions at each step the lifecycle:

  • ‍on_batch_node_complete: is invoked whenever a child in the batch tree (includes grand grand children) completes.‍
  • on_child_complete: is invoked whenever a direct child completes.‍
  • on_child_error: is invoked whenever a direct child fails.‍
  • on_child_dead: is invoked whenever a direct child has exhausted all its retries‍
  • on_batch_complete: is invoked when all children have completed or died.

We still have one piece of logic missing: capturing the import progress. Let's see how to make use of these hooks to implement that functionality.

The batch import job with progress

In order to capture the ProjectImport progress we will use the on_child_complete hook, which is invoked whenever a direct child completes.

Batch jobs also have a progress helper, which can be used to retrieve statistics about the batch.

The hook looks like the following:

Note that we could also use on_batch_node_complete instead of on_child_complete. The result would be the same. The difference between the two is that on_batch_node_complete is called after any job completion in the batch tree while on_child_complete is called only when direct children complete. If you enqueue batch jobs inside a batch job and want to update the progress field in a more granular way, you may use on_batch_node_complete.

This what the batch job looks like in the end:

Unleash the power of batch jobs

The above was a relatively simple example of a batch job having direct descendents only. But cloudtasker also allows you to have batch jobs within batch jobs! To do so simply add batch jobs inside your child workers.

The end result will be a tree of jobs. Each level will be linked back to its parent, itself linked back to its parent etc.. eventually all linking back to the one main parent. Actions (hooks) can be used at each level to perform specific tasks when a level completes.

Using Cloudtasker batch jobs you can easily implement sophisticated batching functionalities.

Sign-up and accelerate your engineering organization today !

About us

Keypup's SaaS solution allows engineering teams and all software development stakeholders to gain a better understanding of their engineering efforts by combining real-time insights from their development and project management platforms. The solution integrates multiple data sources into a unified database along with a user-friendly dashboard and insights builder interface. Keypup users can customize tried-and-true templates or create their own reports, insights and dashboards to get a full picture of their development operations at a glance, tailored to their specific needs.

---

Code snippets hosted with ❤ by GitHub.

Banner designed by pikisuperstar / Freepik