fs: cache glob matcher functions in path.matchesGlob#63915
Conversation
|
Review requested:
|
| nocaseMagicOnly: true, | ||
| }); | ||
|
|
||
| patternFnCache ??= new SafeMap(); |
There was a problem hiding this comment.
i'm not sure if it's worth it to delay the map instantiation like this, but did it just in case.
| }); | ||
| patternFnCache.set(pattern, matcher); | ||
|
|
||
| if (patternFnCache.size >= 250) { |
There was a problem hiding this comment.
randomly picked number, i'm not sure how to find a "good" value though.
There was a problem hiding this comment.
-1 on hardcoding a 'limit', especially one that's an arbitrary number
| if (patternFnCache.has(pattern)) { | ||
| matcher = patternFnCache.get(pattern); | ||
| } else { | ||
| matcher = createMatcher(pattern, { |
There was a problem hiding this comment.
i re-used createMatcher from above as i saw that it uses the same default options as the previous lines did.
i thought about putting the caching in that function instead, but then we would need to base the caching on all the options it accepts, so i decided to keep it here.
Signed-off-by: Adam Haglund <adam@haglund.dev>
Signed-off-by: Adam Haglund <adam@haglund.dev>
83d6da0 to
d158f29
Compare
|
@avivkeller @MoLow if you have time, could you take a look at this PR? 🥰 |
this PR adds benchmarks for
path.matchesGlob()as well as a simple cache for pattern matching functions.rationale
most glob matching libraries in the ecosystem (including
minimatch) provide an API to let users re-use matchers instead of re-creating them every call.node doesn't provide such an API, and does not re-use matchers when possible, leading to it being much slower (10-15x) than just using
new Minimatch()directly when matching a pattern more than once.implementation details
i decided to "hide" away the cache inside the function rather than add a new API as most users will expect the function to not be a potential footgun, and the pattern has been used before in zeptomatch
since
matchGlobPattern()doesn't accept any options other than pattern and path we can safely cache it by the pattern string. if this changes in the future the cache would need to accomodate the new options in the cache keys.i went with the simplest
Mapcache implementation i could come up with as i couldn't find any LRU implementations anywhere in the codebase, if you have any suggestions for improvements please let me know!testing
i'm not sure how to test this properly, e.g. if i should export the cache instance and check its size, etc
some guidance here is very appreciated :)
benchmark results
ran on my windows 11 machine with an AMD Ryzen 9800X3D
note: these benchmarks do not test the case where we evict keys from the cache
i also have a personal benchmarking repo where i found this issue, where the results now look like this:
screenshot