Dead Man's Snitch

Blog

PagerDuty Integration Now Available!

We're extremely excited to announce that we now support sending alerts to PagerDuty! PagerDuty, an Incident Management System for IT Teams, provides alerting, on-call scheduling, escalation policies and incident tracking to increase uptime of your apps, servers, websites and databases. Dead Man's Snitch alerts will show up as incidents and your on-call team members will know immediately if your scheduled jobs go missing.


Integrating with PagerDuty is fast and simple. Head to your account integrations page to get started. Documentation is available here.

If there is a service you would love to see us add support for please let us know!

Posted on 2015-09-03 in integrations, updates


Slack Integration Now Available!

We're extremely excited to announce that we now support sending alerts to Slack! Slack is a collaborative, real-time messaging app that brings all of your team’s communications together in one place. Dead Man's Snitch alerts will show up in your team's Slack channel making it easier to collaborate with your team as soon as your scheduled jobs go missing.


Setting up your account to send alerts to Slack is fast and easy! Head to your account integrations page to get started. Documentation is available here

If there is a service you would love to see us add support for please let us know!

Posted on 2015-09-02 in integrations, updates


Android App Now Available!

We're excited to announce that the first ever Dead Man's Snitch Android app is available for download! As an extension of your account you'll be able to: 

  • View your snitch dashboard and snitch status
  • Filter your snitches by tags
  • Receive push notifications when the status of your snitches change (Private Eye plan or higher)


Let us know what you think! If you're an iPhone user our app just got a refresh including swipe to pause and landscape mode.

Happy Snitching!


Posted on 2015-08-19 in android, updates


Extra Billing Information on Receipts

Some customers may be required to include additional billing-related information to their receipts for accounting purposes.

Now you can easily add this information under your Account settings and it will be displayed on all of your receipts.



Happy Snitching!

Posted on 2015-07-29 in accounts, billing, updates


First Check-in Notification

You may have noticed that new snitches now send an email when your job checks in for the first time.


Don't worry, you won't receive an email for every check-in. We just want you to have confidence that your job and snitch are set up correctly.

Happy Snitching!

Posted on 2015-07-29 in email, updates


Nicer Snitch Setup With Autocomplete

We've recently rolled out a couple of small features to make setting up your snitches easier.

Dead Man's Snitch lets you set up each of your snitches to send alerts to a unique email address. You could even send alerts to more than one email address if you separated the addresses with commas, as shown: "devops@example.com, alerts@example.com".

It's easy to mistype an address, however. To help out, DMS will now autocomplete your email addresses. Any email used in your account will show up as an option in the autocomplete.


A similar autocomplete is now available on your snitch tags as well.


We've also expanded the snitch check-in setup page.

New Snitch Install Methods

Since we now allow you to check in via email, we added directions explaining what email address to send your check-ins to.


We also added an example of checking in from Ruby using the Snitcher gem.


We plan to add examples in more languages soon. If there are any languages you'd like to see on the setup page, let us know at hi@deadmanssnitch.com.

We hope these improvements make setting up your snitches easier.

Happy Snitching!

Posted on 2015-06-29 in autocomplete, email, tags, updates


Dead Simple Blog Setup With ButterCMS

We recently decided to move our blog off of Tumblr and into our main site. We wanted our blog to feel more like part of the main site and an integrated blog is also more SEO friendly.

We thought about publishing our blog as static files with Jekyll. However, it quickly became clear that sharing our main HTML layout file between Jekyll and our main Rails app was not going to be elegant. The layout file needed to be duplicated, making it tedious to keep up to date.

We only need a simple blog, so we ditched Jekyll and started to write vanilla Markdown files we could just render in Rails within our main HTML layout. It seemed to be working well and we were about to migrate all of our posts when someone pointed out a new kid on the block: ButterCMS.


ButterCMS is a blogging platform that sounded like just what we were looking for. They store the blog data. It's SEO friendly. Posts render server-side with no Javascript or iframes. The blog runs inside our app so it can use our existing layout. Even so, we can publish new posts without redeploying our application.

To top it off, ButterCMS setup is dead simple.

1. Sign up.

2. Add their code library to your Rails or Django project.

3. Copy over your API token.

Done!

ButterCMS has been super supportive during our migration. Right now, they are advertising that they'll import your existing blog for free. They imported our Tumbler blog, and it came over fine.

We had to tweak ButterCMS's default views a bit, but we're happy that we don't have to manage our blog data. ButterCMS provides a nice little editor.

We are glad to now have a cleanly implemented blog on our main site. We plan to be posting more DMS tips and tricks, as well as showcasing some of the creative applications people are using DMS for.

In the mean time, if you're looking to integrate an SEO-friendly blog into your application, check out ButterCMS.

Posted on 2015-06-19 in blogging, blog platforms, buttercms, seo, the dms blog


Snitch Check-in via Email

We’re excited to announce the ability to check-in your Snitches via email! Email check-ins have many use-cases and great for things like getting an email when your server goes down. 

Get your Snitch email from the "Email" tab on the left column in the snitch install page. Alerts and check-ins work the same as using curl or the Snitcher gem. 



Please Note: Checking in via email makes it easier to use Dead Man’s Snitch in situations where HTTPS isn’t feasible, though there are some caveats. While email is reliable, they can be easily delayed. Between the time an email is sent and the time it’s received, it usually goes through several intermediaries which may spool, delay, retry, or redirect it before it finally arrives at Dead Man’s Snitch. With check-ins being time sensitive, be aware that false alerts could occur if a snitch checks in towards the end of its period. For this reason we suggest using HTTPS if you can and only use email in cases where it’s the only option.

Feel free to tweet or email us your questions at hi@deadmanssnitch.com.

Happy Snitching!

Posted on 2015-05-07 in email, HTTPS, monitoring, updates


New Tooltips Convert UTC to Local Time Zone For You

We made a small update to our tooltips! Originally we only displayed check-in times in UTC. Some of our users shared with us that this was inconvenient and asked if we could convert the timestamps to their local time.

Now, all tooltips convert snitch check-in timestamps to a user’s local time zone and displays this in the tooltip on hover. 

These tooltips are available on your dashboard…

…and the individual snitch activity page.

We appreciate your feedback. If there’s something you would like to see added reach out to us anytime! 

Posted on 2015-03-24 in timezones, tooltips, updates


Postmortem: March 6th, 2015

On Friday, March 6th we had a major outage caused by a loss of historical data. During the outage we failed to alert on missed snitch check-ins and sent a large number of erroneous failure alerts for healthy snitches. It took 8 hours to restore or reconstruct all missing data and get our systems stabilized. I am incredibly sorry for the chaos and confusion this caused.

So what happened?

On March 6th at 9:30 EST we deployed a change that decoupled two of the models in our system (historical periods and check-ins). At 9:45 EST a user triggered an unscoped deletion of all historical period records when they changed the interval of their snitch.

We were alerted at 9:50 EST and immediately disabled our alerter process to avoid further confusion. We began diagnosing the cause and at 10:50 EST deployed a fix for the unscoped delete. Our next step was to restore the missing data from our backups. We decided to keep the system live and to use a slower but more accurate process to restore the data due to possible conflicts created by keeping the system running.

At 17:30 EST we finished the restoration of most of the historical data and ran a set of data integrity checks to ensure everything was in a clean state. We sent out one final set of “reporting” alerts for any snitches that were healthy but thought to be failed.

How did this happen?

We use a pull request based development process. Whenever a change is made it is reviewed by another developer and then merged by the reviewer. It’s common to make several revisions to a change before it is merged.

In this case, the unscoped deletion was introduced as part of implementing a suggestion to reduce the number of queries made during an interval change. When making the change the scoping to only those periods for a snitch was accidentally removed. The code was reviewed but the scoping issue was missed on final review.

Additionally, we have an extensive test suite in place that gives us confidence when we make large changes to the system. Our tests did not uncover this issue since the unscoped delete satisfied our testing conditions.

Our next steps

1. We have reviewed our use of destructive operations that could be prone to scoping issues (e.g. Model.where(…).delete_all) and have found that this was the only instance of it left in our codebase.


2. We have reviewed our tests around destructive behavior and have added cases to ensure they only affect the records they should.


3. Our restore and recovery process took much longer than we would like. We developed a set of tools for checking data integrity while we waited for the restore to finish and we will be fleshing these out further and making them a part of our normal maintenance routine. Lastly we will be planning and running operations fire drills to improve our readiness for cases like this.

Summary

Monitoring failures can mean lost sleep, lost time, and added stress to an already stressful job. As an operations person I am well aware of the trouble a malfunctioning system can cause. I am very sorry for the chaos and confusion caused by our failings. We very much see Friday’s issues a failure of our development process and are taking the steps to improve that process.

Should we have future issues the best place to get notified is to subscribe to notifications at status.deadmanssnitch.com or to follow us on twitter.

- Chris Gaffney
[i] Collective Idea

Posted on 2015-03-09 in postmortems