Issues accessing app.close.io
Incident Report for Close
Postmortem

For approximately 30 minutes users were unable to load Close.io in browsers or desktop apps due to a JavaScript error that prevented Close.io from getting past the loading screen. This issue did not affect anyone who had Close.io open – windows/tabs where Close.io was already loaded continued to operate normally. However if you tried to load Close.io in a new window/tab during this time, you were unable to. Our API remained fully operational.

We are extremely sorry for this downtime and know that you count on Close.io being reliable for your sales needs.

Here are some technical details of the timeline of events and how we're working to make sure a similar issue can't happen again.

  • We deployed a new feature that required a new JavaScript library
  • This feature was well-tested in development mode.
  • This feature wasn't first tested manually in a staging environment using the production-like build/minified JavaScript. However we do run our JavaScript unit tests using the production/minified JavaScript file before deploying. In the past, that has almost always been enough to catch differences between JavaScript in development versus production. In this case, however, the CI tests passed fine.
  • When the feature was deployed in production, a RequireJS error from loading the new JavaScript library prevented the rest of Close.io from loading/functioning.
  • Once the change hit production, we very quickly recognized the problem and knew we needed to rollback. The status page event was opened within a couple minutes.
  • Without a quick way to rollback, we had to complete our standard deployment process to revert the change which takes approximately 30 minutes.
  • About 20 minutes in, some requests to load Close.io used the new functional JavaScript. After about 30 minutes, all requests were using the new JavaScript.

What we're doing to prevent similar issues from occurring:

  • Making our development environment more closely match our production environment, particularly in how we load and run JavaScript.
  • Introducing a new test in our CI test suite that catches this type of JavaScript module loading issue.
  • Introducing a better staging deployment system that makes it easier for developers to quickly test on staging, as well as introducing better processes around what types of changes must be tested on staging.
  • (Re-)Introducing a "rollback" feature into our deployment process so that we can rollback a deployment very quickly if we detect issues, rather than having to wait for a new complete deployment. We previously had this capability but had to remove it due to a poor implementation.
  • Speeding up our regular deployment process so that even without rollbacks we have the ability to more quickly ship updated code in case of issues.
Posted Sep 09, 2016 - 10:05 PDT

Resolved
We have resolved the issues with accessing app.close.io. Anyone who encountered issues with the desktop client should close and reopen the application and users of the web UI will need to reload their browser.
Posted Sep 07, 2016 - 15:32 PDT
Update
Users of the native application are also affected by this issue.
Posted Sep 07, 2016 - 15:11 PDT
Update
All users of the web UI are affected by this issue. We are in the process of reverting this change.
Posted Sep 07, 2016 - 15:09 PDT
Identified
We have identified an issue with a recent change to app.close.io affecting some customers.
Posted Sep 07, 2016 - 15:06 PDT