If you’re familiar with crontab in Linux, there’s a good chance you’re equally familiar with the infamous cron job silent failures. Many of us sysadmins and developers have experienced these failures without knowing before it’s too late. Automated backups and sending monthly emails aren’t always as automated (or on time) as we tend to think. Herein lies the problem.
My cron jobs send me an email when they run…isn’t that enough?
That can be true, but is there REALLY any value in knowing that they ran? Isn’t that why you created the cron job in the first place–so that it does its job? Sure, receiving an email of cron output after it runs is great. However, the value lies in knowing when your cron jobs FAIL to run (or are delayed). Then, you can investigate and fix the problem before it’s too late. Not convinced? Consider this example from a long time Dead Man's Snitch user, Kareem Mayan, co-founder of SocialWOD.com.
A little background:
“At SocialWod.com we do workout tracking for gyms. When a new workout is emailed to us from a customer (in the form of a photo of a whiteboard, which has the workout and results), we put the data online. Once it’s online, we email that gym’s clients telling them new workout results have been posted.
Great. Where’s the problem?
"When Delayed Job failed silently, we wouldn’t know until me or my co-founder was prompted to look based on seeing something funny, e.g. seeing a Stripe email about a new customer signup but NOT seeing the automated welcome email to the new customer (sent by our system… which was waiting in the database, ready for Delayed Job to pick it up, which would never happen because that process had died).”
"The result would often be several days of emails (THOUSANDS of emails) queued up until one of us manually restarted Delayed Job. This sucked because customers would either get a ton of emails in once, and some would be days old, or we would delete those emails before they got sent. This also sucked because customers would never get notifications about their posted results.”
How did Dead Man’s Snitch help?
"Using Dead Man’s Snitch made that problem go away. Now, if Delayed Job dies, Dead Man's Snitch never gets pinged, and we get an email as soon as that happens. At most we’ll go five minutes - not days - before knowing that we need to kick Delayed Job into action again.”
If you can relate and want the peace of mind in knowing right when your cron jobs fail, give Dead Man’s Snitch a try and sign up for free. After all, your first snitch is on us. However, if you can’t relate but you actually made it to the end, I applaud you. If this topic doesn’t relate to you, there’s a good chance your computer friends, IT department, or website managers would. Do them a favor and pass this on.