Precompiling Rails Templates
Rails 6 comes with some improvements to ActionView template performance:
- Compiled templates are de-duplicated in more situations
- Dev-mode now caches templates between requests, which should feel much more production-like
- Template resolution (matching
render "users/user"
tousers/_user.html.erb
) is much faster when uncached
A further improvement would be to load templates eagerly. Template compilation can still be fairly expensive, so, if possible, we’d like to load them eagerly during app boot rather than lazily during the first request needing that template.
Loading the templates earlier should save memory when using a forking web server like unicorn or puma, and moves some cost upfront rather than making a user wait.
I’ve written an experimental library to do this,
actionview_precompiler
.
Actionview Precompiler works by translating templates into Ruby using their Rails template handler
(ERB, HAML, etc…) and parsing that output using Ruby 2.6’s RubyVM::AbstractSyntaxTree
.
We need to know the actual render calls because templates are compiled differently depending on which locals they are passed.
As a benchmark I tried this approach against exercism.io’s home page. exercism.io makes a good benchmark as a well written, non-trivial, fairly vanilla Rails app which is open source.
It also uses HAML which, being a bit slower than ERB, makes precompilation more significant.
Without any precompilation the first request’s flamegraph looks like this (x-axis is time, y-axis stack frames)
I’ve highlighted where we’re spending time compiling templates. It’s a fairly significant part of the request time.
We render 8 templates here, and each needs to be called as we’re rendering the pages and a user is awaiting a response.
If we call ActionviewPrecompiler.precompile
in the parent process…
Looks great! We’ve eliminated all but one of those render calls (which came from a helper).
To measure the performance improvement I ran 100 iterations each of making a request cold (default configuration), with precompilation, and “warmed” by making the same request in the parent process (this is basically the best we could hope to do).
Precompilation takes the average time of our first request from ~102ms to ~60ms.
This should be reproducible from benchmark_first_request.rb
(which should work in other apps) and the changes I made to exercism to get a production-like environment.
Thanks for reading!