How to Daemonize Rake Tasks on Ubuntu Through SSH

Daemons and Rake

At some point in your web development journey you may find yourself wanting to execute something far away from you and your terminal (think remote server). More than likely, the first thing that will enter your mind is to execute the following:

$ my_rake_task

Easy right?

The problem is that once you quit your ssh connection, in other words once your remote terminal dies (developers have to sleep too), so does the process; they are "attached."

Next, you may think, "Well, if I can't see it then it's obviously running without me needing to be there" and you'll execute something like this:

$ my_rake_task &

The process you create is just a fork of your terminal's, so once again, as soon as you exit, so does the forked process.

What you could use is a daemon.

start-stop-daemon

There is a RubyGem called Daemons which can daemonize processes within Ruby. While this may be an excellent option if a piece of Ruby code needs to be momentarily daemonized, I found the syntax to be a little confusing and did not want to add an abstraction of complexity on top of an already relatively complex subject. Also, executing something as simple as a rake task in a Ruby script requires jumping through a few hoops (again, simplicity is the goal here). For these reasons I went with start-stop-daemon.

start-stop-daemon is an excellent tool provided by many Linux distributions to make it "easier" to control daemons. start-stop-daemon requires quite a few arguments and options which are necessary to get your daemon up and running. Here is an example:

$ start-stop-daemon --start -m --pidfile /path/to/pidfile.pid -u $USER -d /path/to/working/dir -b --startas /path/to/script

Rather long, isn't it?

This might appear to be just-another-Linux-command-with-10-*nixillion-options-I'll-never-use-except-that-one-time-when-I-really-need-it, but this is, in fact, quite terse for what it does.

Let's go through some of the options and arguments:

  • --start: The easiest and clearest of them all. It kicks off the daemon.

  • --pidfile=PATH: This is the location of the pidfile which Linux relies on to store the PID once the daemonized process begins. Specifying this path when trying to kill the process will also provide the PID that needs to be killed.

  • --m: Used in tandem with the --pidfile=PATH option. It will create a file, if your program does not do it already, at the specified path and store the PID there.

  • -u USER: Specify a user that the process will be owned by.

  • -d PATH: start-stop-daemon will change to this directory before it executes the process. This is especially useful if oftentimes you will need to execute the process in a given directory. If this option is not provided, start-stop-daemon will run in the root directory.

  • b: Or background. If your program does not detach by itself-most Ruby programs do not-then use this option.

  • --startas: This is the executable that will run as the daemonized process. Remember to chmod your file before executing start-stop-daemon.

Note: start-stop-daemon does nothing to monitor the process.

If you need more options, check out the man pages here, but this is enough to get going.

The Process Hydra

Here is an example script of the executable that may want to be run:

#!/bin/bash
# Location: ~/awesome_task
# Executes a task that is definitely awesome

rake run_my:definitely_awesome_task  

And this is also the process that has its process ID stored in the path option given to --pidfile.

Catch that?

Let's look a little closer.

Suppose we are quite done with this process (as it's been running for 8 hours) and decide to kill it.

First we need the process:

$ cat /path/to/pidfile.pid
=> 9384

Then we try to kill it off:

$ kill 9384

All done right? Not so fast! Let's make a quick check:

$ ps aux | grep awesome_task
...
$USER 9385 10.0 38.3 304958 128399 ?      S1     08:00   8:01 ruby /path/to/bin/rake run_my:definitely_awesome_task
...

How can our rake task still exist??

There are actually two processes at play here:

  1. The bash script
  2. AND the rake task.

The rake task is executed and ran as another process whose parent process is the bash script. This means that the rake task can continue running even if the bash script is killed off.

However, this also means that if the rake task is killed, the bash script will execute the rest of its script (of which, in this example, there is nothing else) and exit. In other words, there is a chance that killing off the child process very well could end the parent process.

Giving Daemons Life

There are many services out there that monitor daemons and make sure they are alive. However, if all that is required is that a task be run over and over (which was my use-case) then the following may be all you need:

#!/bin/bash

while true  
do  
    rake another:definitely_awesome_task
done  

Unless the server crashes or some reason the process is randomly killed off then this should last for quite some time. Indefinitely, in fact.

Order matters in death. Remember: parents die first.

If you want to kill off both processes, make sure to identify both processes and kill the parent first. Killing the parent first gives you the guarantee that no other process will be spawned from the rest of the execution of the script. While this may not always be the case, an infinite loop always carries the chance of executing something faster than you can ensure its death.

Now that the daemons are setup to be run, let's hop over the wire.

Possible SSH Problems

There are several problems you may encounter if your code executes commands over ssh. Even if all you do is execute something as simple as:

$ ssh numbluk@ip ~/bash_script_with_crazy_simple_task

I still recommend that you read the rest of the way through.

Strange Parsing

SSH has a peculiar parsing syntax which means that sometimes it will parse things in unexpected ways. To get around this there are two steps you can follow to ensure your results are consistent with what to expect from a local shell:

  • Prefer constructing your argument as a string first if your rake task relies on making a command over ssh
  • Always quote your commands

Bad:

$ ssh numbluk@ip ls -alh ~/

Good:

$ ssh numbluk@ip "ls -alh ~/"

Redirection to Nowhere

One-time ssh commands are executed remotely, however, the output is redirected to your local shell. This means that if you rely on the output of the remotely executed command that the output must be stored in a variable if it is to be used again.

Always expect unexpected redirection when dealing with ssh. My recommendation is that a test should always be made locally first and then progress to making the same command over ssh.

Not Everyone's Environment

If your rake task relies on environment variables at all during its execution, then you may run into a problem with ssh.

This problem is almost a phantom as a rake task may very well execute, but it will not execute the correct variables.

To get around this, do the following:

$ ssh numbluk@ip

Open /etc/ssh/sshd_config and edit or add

...
PermitUserEnvironment yes  
...

Next, you must specify what the ssh'd user's environment will be. To permit everything in the current ssh environment:

$ env >> ~/.ssh/environment

And that should do it!

In Memoriam

I want to give a big thanks to Pete Hanson's Linux know-how; he was an immeasurable help throughout this drawn-out process.

Also, thanks to Michael Mentele for being a superb rubber duck.