Skip to content

Conversation

@evilstreak
Copy link
Contributor

Background

We made some changes to the search results page for GOV.UK and when we deployed the new code we saw a massive spike in CPU usage on our servers. We rolled back the change and investigated, and tracked the issue down to the Mustache gem.

The problem turned out to be the way we had used partials, and the way Mustache loads and renders them. We found a work-around using a different partial structure, but since the original structure is easier to work with and likely to be repeated by another developer in the future, wanted to address the root cause in Mustache.

What's the problem?

The mustache manual shows partials being used like:

base.mustache: <h2>Names</h2>{{#names}}{{> user}}{{/names}} user.mustache: <strong>{{name}}</strong> 

When using partials in this way, the user.mustache partial is loaded and compiled once for each iteration, even though the content is static.

Compiling a template is computationally expensive, especially for large and complex partials.

What's changed?

This change caches a template object for each unique partial string after indentation is applied. Using the same template object each time the partial is used means that compilation only has to happen once.

How much faster is it?

We've added a benchmark script for rendering partials. Doing a before and after comparison shows a ~16–20x speed up, depending on where it's run and how many list items need to be iterated through.

For complex partials the speed up can be better than ~50x. Here's a benchmark for the actual partial we were using for search results on GOV.UK: https://gist.github.com/evilstreak/9d3cc767fd0734bbb1f0

Here's a sample run of the benchmark included in this pull request:

Before

Calculating ------------------------------------- render list of 10 27.000 i/100ms render list of 100 2.000 i/100ms render list of 1000 1.000 i/100ms ------------------------------------------------- render list of 10 422.178 (±20.1%) i/s - 2.025k render list of 100 42.587 (±21.1%) i/s - 204.000 render list of 1000 3.464 (±28.9%) i/s - 17.000 

After

Calculating ------------------------------------- render list of 10 735.000 i/100ms render list of 100 65.000 i/100ms render list of 1000 6.000 i/100ms ------------------------------------------------- render list of 10 7.736k (±11.8%) i/s - 38.220k render list of 100 788.855 (±13.6%) i/s - 3.900k render list of 1000 73.325 (±17.7%) i/s - 360.000 

Possible objections

  • Risk of memory leaks. A previous pull request for caching templates was rejected because it would leak memory where templates were being dynamically generated or changed at runtime. That pull request redefined the Template.new method and used a class variable to store the cached versions of templates. Since we're using instance variables Ruby should be able to garbage collect them when they're finished with. That should mean we aren't at risk of introducing memory leaks.
  • String comparisons. We're using the entire partial string to key the hash of cached templates. For large templates, that could be a very long string which will need to be used for comparisons. Ruby's Hash implementation is pretty efficient, so it's unlikely we could do better with a naive approach like calculating an MD5 hash of the partial to key the template with. Even if we could find a more efficient approach like that, we'd have to worry about the risk of collisions.
  • Lack of tests. Since the behaviour of the system ought to be identical from the outside there isn't anything obvious to test. The extra benchmark demonstrates the difference in performance, and the existing tests for partials cover both cache hits and cache misses.

Dominic Baggott and Mike Patrickand others added 2 commits April 20, 2015 15:14
When mustache encounters `{{> partial_name}}` it loads the `partial_name` partial and renders it using the current context. This is a fairly expensive code path because it involves loading the partial template file, compiling the template, and rendering it. In order to make improvements to the performance of this we want to have a benchmark to assess code changes against.
The mustache manual shows partials being used like: ``` base.mustache: <h2>Names</h2>{{#names}}{{> user}}{{/names}} user.mustache: <strong>{{name}}</strong> ``` When using partials in this way, the `user.mustache` partial is loaded and compiled once for each iteration, even though the content is static. Compiling a template is computationally expensive, especially for large and complex partials. This change caches a template object keyed by the partial string, after indentation is applied. Using the same template object each time the partial is used means that compilation only has to happen once. For simple partials (such as the example from the manual, shown above) there's a ~16x speed up. The gain does not increase much as the list length grows. For complex partials the speed up can be better than ~50, getting better with longer lists.
@hugopeixoto
Copy link

Compiling a partial seems to depend on two things: its name and the indentation level. Maybe using that pair as the cache key would make some sense (to avoid the memory overhead of keeping the whole string around)

@evilstreak
Copy link
ContributorAuthor

@hugopeixoto The problem with caching based on the name of a partial is that the content of the partial is not guaranteed to be the same from one use to the next. Here's a contrived example:

base.mustache:{{#people}}{{> hello}}{{/people}} hello.mustache Hello{{name}} 
classMyMustache < Mustachedefpartial(name)@counts ||= {}@counts[name] ||= 0count_string="-- this partial has been included #{@counts[name]} times already\n"@counts[partial] += 1super + count_stringendend

That would output something like:

Hello Dom -- this partial has been included 0 times already Hello Hugo -- this partial has been included 1 times already Hello Ricardo -- this partial has been included 2 times already 

I have no way of knowing how many people are using the gem like that, but it would suck to break it for them. There was definitely concern about people using dynamically generated templates in #173, though I'm not sure if it was with a use case like this in mind.

The only way to be sure that the template we want is equivalent to one we've used before is to use the full string.

@hugopeixoto
Copy link

@evilstreak you're right, I forgot that we allowed overriding #partial.
I guess that in that dynamic example you'd also be using a large amount of memory, as each new template would produce a new entry in the cache table, but at least it would be wiped when the full render had finished. Eventually we could implement an LRU cache to prevent that.

The only thing that comes to mind that could be a problem (and I'm playing devil's advocate here) is if someone overrode #render in their mustache view, but that doesn't seem very plausible.

@evilstreak
Copy link
ContributorAuthor

I considered overriding #render to do caching in our subclass but it's such an involved method it felt like it would be a really brittle option, likely to break with future versions of the gem.

@evilstreak
Copy link
ContributorAuthor

Is this PR something you're broadly open to? It would be good to get a "yes, this feature is fine in principle but still needs a review to consider the approach taken and the code style" or a "no, this isn't a feature that should be in this library".

@hugopeixoto
Copy link

Personally I don't see anything wrong with the feature being included, but I'm not a maintainer. I'll try to ping @locks to review this.

What do you think about adding an LRU? Does it make sense? Sam Saffron has a gem called lru_redux (https://github.com/SamSaffron/lru_redux) which might be easy to add.

@evilstreak
Copy link
ContributorAuthor

Given that the cache is only going to be hanging around for the lifetime of the Mustache object, I think going for a more advanced caching approach is overkill.

The only case where this naive caching could cause problems is where someone has a large number of different, large templates, and a mid-render garbage collection is necessary to stop the process running out of memory.

I don't think guarding against that case is worth the cost of including a dependency, especially since this library doesn't depend on any other gems at the moment.

locks added a commit that referenced this pull request May 16, 2015
Cache compiled templates for partials
@lockslocks merged commit 4c84569 into mustache:masterMay 16, 2015
@locks
Copy link
Member

🎆

@judofyr
Copy link
Contributor

We could also make template_for_partial public API. Anyone who needs a special caching technique can override this method.

evilstreak added a commit to alphagov/shared_mustache that referenced this pull request Jun 24, 2015
The change we want to include is the performance improvement when using partials in loops: mustache/mustache#205 The breaking change from 0.99 to 1.0 is that mustache now requires Ruby >= 2.0.
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

@evilstreak@hugopeixoto@locks@judofyr