If I haven't answered your question here, email me!
You must have a Ruby web application running on Heroku. It doesn't have to be Rails, but it does need to use Rack.
heroku addons:create rails-autoscale from a terminal. Check out the Getting Started guide for step-by-step instructions.
Every Rails Autoscale plan supports the same features and the same service. The only difference is the maximum number of dynos supported.
For example, if you might need to autoscale to four dynos or more, you'll at least need the Silver plan. See the autoscale range docs for more on this.
The "trial" plan is the default when installing via the Heroku CLI (
heroku addons:create rails-autoscale). With the trial plan, you have a week of unlimited autoscaling. At the end of seven days, the add-on will remain installed, but autoscaling will be disabled until you upgrade to a paid plan.
Note that if you select a different plan when installing the add-on, Heroku will begin charging you immediately. There is no trial period built into the paid plans. (The add-on marketplace does not support it.)
Rails Autoscale does not support attaching to multiple apps. You must install the add-on separately for each app.
You can use different autoscalers for different processes. For example, you could use Heroku's native autoscaling for web dynos and Rails Autoscale for worker dynos.
Do not use multiple autoscalers on the same process. This results in very unpredictable scaling behavior.
Heroku offers a native autoscaling solution that's worth a try if you run performance dynos and you only need to autoscale web dynos. Here's what makes Rails Autoscale different:
You must be running a Rack-based Ruby app on Heroku. If your Rack-based app is not running Rails, see these instructions on setting up
The agent has no noticeable impact on response time. It collects the queue time for each request in memory—a very simple operation—and an async reporter thread periodically posts those queue times to the Rails Autoscale service. Check out the middleware code on GitHub if you're interested.
Bad Things will happen. The Rails Autoscale add-on manages this config var, so it's best to leave it alone.
Also note that if you fork a Heroku app, it will copy the config vars, including
RAILS_AUTOSCALE_URL. This also results in Bad Things, because Rails Autoscale doesn't know about the forked app. If you do fork a Heroku app with Rails Autoscale installed, be sure to remove the
RAILS_AUTOSCALE_URL config var.
Rails Autoscale only triggers autoscale events in response to breaches of your queue time thresholds. For example, if your app is running a single dyno, changing your minimum dynos setting to "2" will not cause an immediate upscale event. It will remain at a single dyno until your upscale threshold is breached.
Rails Autoscale does not support this natively. This request is usually a desire to have a minimum number of dynos running during busy times and scale down further during quiet times. My recommendation here is to allow your app to scale down, even during busy times. If you scale down too far—or if traffic picks up—you'll immediately scale back up. Trust the autoscaler and give it a try!
Put simply, request queue time is the time between Heroku's router receiving a request and your app beginning to process the request. It includes network time between the router and application dyno, and it includes time waiting within the dyno for an available application process. The latter is what we care about—if requests are waiting for more than a few milliseconds, there's a capacity issue.
This is why Rails Autoscale only scales based on queue time. Web requests can be slow for lots of reasons, but queue time always reflects capacity.
When your request queue time breaches your upscale threshold, Rails Autoscale will send an upscale request to Heroku within 20 seconds. The agent reports metrics every 10 seconds, and it can take up to 10 more seconds for this data to be processed.
After sending the request to Heroku, it'll take between 20 and 60 seconds (depending on the startup time for your app) for your new dyno to begin receiving requests.
Apps that receive steep spikes in traffic should consider scaling up by multiple dynos at a time. This option is available in your advanced settings.
Most APM tools like New Relic and Scout are showing you the average for a given metric. Averages might provide smoother charts for overall trends, but they aren't useful for detecting a capacity issue. Rails Autoscale uses the 95th percentile, so it will always be higher.
Unless you're using Heroku's Preboot feature, your app will be temporarily unavailable while it boots, such as during deploys and daily restarts. During this time, requests are routed to your web dynos, where they wait. All this waiting is reflected in your request queue time, which will likely cause an autoscale for your app.
This is not a bad thing! Your app autoscaling during a deploy means it'll quickly recover from the temporary downtime during boot, and of course, it'll autoscale back down once it catches up.
The Rails Autoscale agent only runs in a web process, so you must be running at least one web dyno. Even worker metrics are collected from the agent running in your web process.
Also note that a web request is what initially starts the agent process. If your app receives little or no web traffic, this could result in the agent never starting and never reporting metrics to Rails Autoscale. To work around this limitation, use an uptime monitor (FreshPing is a free option) to continually ping your site.
Anytime you restart or shut down a worker dyno (such as downscaling, deploying, or restarting), you risk killing long-running jobs. Autoscaling often magnifies this issue because you're shutting down worker dynos much more frequently.
Your worker backend will typically re-enqueue these jobs after being terminated, so you must ensure that your jobs are reentrant—that they can successfully re-run after a previous, interrupted run. If possible, also try breaking long-running jobs into a batch of smaller jobs.
The agent takes a snapshot of job latency (queue time) every 10 seconds. If your job latency frequently hovers at 0 milliseconds, this might look like missing data in Rails Autoscale.
If you do expect to see some worker queue time in Rails Autoscale, it's possible the agent is not running. Do you see queue time for your web dynos? If not, you're probably running a worker-only app or an app that receives very little web traffic.
If you do see web queue times but no data for your worker dynos, email email@example.com.