If I haven't answered your question here, email me!

You must have a Ruby web application running on Heroku. It doesn't have to be Rails, but it does need to use Rack.

Just run heroku addons:create rails-autoscale from a terminal. Check out the Getting Started guide for step-by-step instructions.

Every Rails Autoscale plan supports the same features and the same service. The only difference is the maximum number of dynos supported.

For example, if you might need to autoscale to four dynos or more, you'll at least need the Silver plan. See the autoscale range docs for more on this.

The "trial" plan is the default when installing via the Heroku CLI (heroku addons:create rails-autoscale). With the trial plan, you have a week of unlimited autoscaling. At the end of seven days, the add-on will remain installed, but autoscaling will be disabled until you upgrade to a paid plan. For more details, see the plans & pricing docs.

Rails Autoscale does not support attaching to multiple apps. You must install the add-on separately for each app.

You can use different autoscalers for different processes. For example, you could use Heroku's native autoscaling for web dynos and Rails Autoscale for worker dynos.

Do not use multiple autoscalers on the same process. This results in very unpredictable scaling behavior.

Heroku offers a native autoscaling solution that's worth a try if you run performance dynos and you only need to autoscale web dynos. Here's what makes Rails Autoscale different:

  • Web autoscaling based on request queue time instead of total response time. This means more reliable autoscaling.
  • Worker autoscaling for Sidekiq, Delayed Job, and Que.
  • Works great on standard and performance dynos.
  • Personalized customer support from the developer who built it.

You must be running a Rack-based Ruby app on Heroku. If your Rack-based app is not running Rails, see these instructions on setting up rails_autoscale_agent.

The agent has no noticeable impact on response time. It collects the queue time for each request in memory—a very simple operation—and an async reporter thread periodically posts those queue times to the Rails Autoscale service. Check out the middleware code on GitHub if you're interested.

Perhaps you're running a worker-only app, or an app with very little web traffic? If not, check out the troubleshooting guide.

Bad Things will happen. The Rails Autoscale add-on manages this config var, so it's best to leave it alone.

Also note that if you fork a Heroku app, it will copy the config vars, including RAILS_AUTOSCALE_URL. This also results in Bad Things, because Rails Autoscale doesn't know about the forked app. If you do fork a Heroku app with Rails Autoscale installed, be sure to remove the RAILS_AUTOSCALE_URL config var.

Most likely, the agent is not running. The troubleshooting guide will help you resolve this.

Rails Autoscale does not support this natively. This request is usually a desire to have a minimum number of dynos running during busy times and scale down further during quiet times. My recommendation here is to allow your app to scale down, even during busy times. If you scale down too far—or if traffic picks up—you'll immediately scale back up. Trust the autoscaler and give it a try!

If you really need different settings on some kind of schedule, you can create something yourself using the Rails Autoscale API alongside Heroku Scheduler or some other job scheduler.

Put simply, request queue time is the time between Heroku's router receiving a request and your app beginning to process the request. It includes network time between the router and application dyno, and it includes time waiting within the dyno for an available application process. The latter is what we care about—if requests are waiting for more than a few milliseconds, there's a capacity issue.

This is why Rails Autoscale only scales based on queue time. Web requests can be slow for lots of reasons, but queue time always reflects capacity.

When your request queue time breaches your upscale threshold, Rails Autoscale will send an upscale request to Heroku within 20 seconds. The agent reports metrics every 10 seconds, and it can take up to 10 more seconds for this data to be processed.

After sending the request to Heroku, it'll take between 20 and 60 seconds (depending on the startup time for your app) for your new dyno to begin receiving requests.

Apps that receive steep spikes in traffic should consider scaling up by multiple dynos at a time. This option is available in your advanced settings.

Most APM tools like New Relic and Scout are showing you the average for a given metric. Averages might provide smoother charts for overall trends, but they aren't useful for detecting a capacity issue. Rails Autoscale uses the 95th percentile, so it will always be higher.

Unless you're using Heroku's Preboot feature, your app will be temporarily unavailable while it boots, such as during deploys and daily restarts. During this time, requests are routed to your web dynos, where they wait. All this waiting is reflected in your request queue time, which will likely cause an autoscale for your app.

This is not a bad thing! Your app autoscaling during a deploy means it'll quickly recover from the temporary downtime during boot, and of course, it'll autoscale back down once it catches up.

Two possible reasons:

  • The autoscale agent starts up when your app receives its first request. If you've recently deployed or restarted—and your app is not receiving traffic—then the agent is probably not running.
  • Even if the agent is running, Rails Autoscale has a safeguard in place that prevents downscaling an app that hasn't reported any data for five minutes. This prevents downscaling an app that might be having a problem with the reporting agent.

Both of these issues are most common on staging/demo apps. As long as your app is receiving some regular traffic (such as most production apps), then you shouldn't run into this. If you need a workaround, the best approach is to use an uptime monitor to regularly ping your app.

The Rails Autoscale agent only runs in a web process, so you must be running at least one web dyno. Even worker metrics are collected from the agent running in your web process.

Also note that a web request is what initially starts the agent process. If your app receives little or no web traffic, this could result in the agent never starting and never reporting metrics to Rails Autoscale. To work around this limitation, use an uptime monitor (FreshPing is a free option) to continually ping your site.

Sidekiq, Delayed Job, Que, and Resque (beta) are currently supported.

Anytime you restart or shut down a worker dyno (such as downscaling, deploying, or restarting), you risk killing long-running jobs. Autoscaling often magnifies this issue because you're shutting down worker dynos much more frequently.

Your worker backend will typically re-enqueue these jobs after being terminated, so you must ensure that your jobs are reentrant—that they can successfully re-run after a previous, interrupted run. If possible, also try breaking long-running jobs into a batch of smaller jobs.

The agent takes a snapshot of job latency (queue time) every 10 seconds. If your job latency frequently hovers at 0 milliseconds, this might look like missing data in Rails Autoscale.

If you do expect to see some worker queue time in Rails Autoscale, it's possible the agent is not running. Do you see queue time for your web dynos? If not, you're probably running a worker-only app or an app that receives very little web traffic.

If you do see web queue times but no data for your worker dynos, email help@railsautoscale.com.