Don't keep important data in your Celery queue

Dan Poirier

The Celery library (previous posts) makes it as easy to schedule a task to run later as calling a function. Just change:

send_welcome_email('dan@example.com')

to:

send_welcome_email.apply_async('dan@example.com')

and rely on Celery to run it later.

But this introduces a new point of failure for your application -- if Celery loses track of that task and never runs it, then a user will not get an email that we wanted to send them.

This could happen, for example, if the server hosting your Celery queue crashed. You could set up a hot backup server for your queue server, but it would be a lot of work.

It's simpler if you treat your Celery queue like a cache -- it's helpful to keep around, but if you lose it, your work will still get done.

[Ed: the code snippets in this post are just intended as little illustrations to go with the ideas in this post, and by no means should they be taken as examples of how to implement these ideas in production. ]

We can do that by changing our pattern for doing things in the background. The key idea is to keep the information about the work that needs to be done in the database, with the rest of our crucial data.

For example, instead of:

send_welcome_email.apply_async('dan@example.com')

we might add a needs_welcome_email Boolean field to our model, and write:

user.needs_welcome_email = True
user.save()

Now we know from our database that this user needs to get a welcome email, independently of Celery's queue.

Then we set up a periodic task to send any emails that need sending:

@task
def send_background_emails():
    for user in User.objects.filter(needs_welcome_email=True):
        send_welcome_email(user.email)
        user.needs_welcome_email = False
        user.save()

We can run that every 5 minutes, it'll be quite cheap to run if there's no work to do, and it'll take care of all the "queued" welcome emails that need sending.

If we want the user to get their email faster, we can just schedule another run of the background task immediately:

user.needs_welcome_email = True
user.save()
send_background_emails.apply_async()

And the user will get their email as fast as they would have before.

We will still want to run the background task periodically in case our queued task gets lost (the whole point of this), but it doesn't have to run as frequently since it will rarely have any work to do.

By the way, I learned this while doing a code review of some of my co-worker Karen's code. This ability to continue learning is one of my favorite benefits of working on a team.

Expiring tasks

Now that we've made this change, it opens up opportunities for more improvements.

Suppose we're scheduling our periodic task in our settings like this:

CELERYBEAT_SCHEDULE = {
    'process_new_work': {
        'task': 'tasks.send_background_emails',
        'schedule': timedelta(minutes=15),
    },
}

Every 15 minutes, celery will schedule another execution of our background task, and if all is well, it'll run almost immediately.

But suppose that our worker is unavailable for a while. (Maybe it lost connectivity temporarily.) Celery will keep on queuing our task every 15 minutes. If our worker is down for a day, then when it comes back, it'll see 24*4 = 96 scheduled executions of our task, and will have to run the task 96 times.

In our case, we're not scheduling our task all that frequently, and the task is pretty lightweight. But I've seen times when we had thousands of tasks queued up, and when the workers were able to resume, the server was brought to its knees as the workers tried to run them all.

We know that we only need to run our task once to catch up. We could manually flush the queue and let the next scheduled task handle it. But wouldn't it be simpler if Celery knew the tasks could be thrown away if not executed before the next one was scheduled?

In fact, we can do just that. We can add the expires option to our task when we schedule it:

CELERYBEAT_SCHEDULE = {
    'process_new_work': {
        'task': 'tasks.send_background_emails',
        'schedule': timedelta(minutes=15),
        'options': {
            'expires': 10*60,  # 10 minutes
        }
    },
}

That tells Celery that if the task hasn't been run within 10 minutes of when we schedule it, it's not worth running at all and to just throw it away. That's fine because we'll schedule another one in 5 more minutes.

So now what happens if our workers stop running? We continue adding tasks to the queue - but when we restart the worker, most of those tasks will have expired. Every time the worker comes to a task on the queue that is expired, it will just throw it away. This allows it to catch up on the backlog very quickly without wasting work.

Conclusion

As always, none of these techniques are going to be appropriate in all cases. But they might be handy to keep in your toolbox for the times when they might be helpful.

blog comments powered by Disqus

Back to blog

Technical

Don't keep important data in your Celery queue

Dan Poirier

Expiring tasks

Conclusion

You might also like:

Don't keep important data in your Celery queue

Dan Poirier

October 18, 2016

Expiring tasks

Conclusion

You might also like:

Success!

You're already subscribed