Mozilla Rhino doesn’t use the facilities of java.util.regex
to implement regular expression support in JavaScript but provides own regular expression implementation. Sadly this implementation is both buggy (see bug 369860) and slow.
Bug with non capturing groups
Here is a variation of bug 369860 occurring in String.replace.
The regular expression used is really “real life” stuff as it is what the prototype library uses to filter scripts in the text of received XMLHttpRequest.
public void testBuggyReplace()
{
final Context ctx = Context.enter();
final ScriptableObject topScope = ctx.initStandardObjects();
final String text = "<b>bla</b><script>alert(123);</script>bla";
final String regex = "(?:<script.*?>)((\\n|\\r|.)*?)(?:<\\/script>)";
final String expected = "<b>bla</b>bla";
assertEquals(expected, text.replaceAll(regex, "")); // check replacement
topScope.put("str", topScope, regex);
topScope.put("text", topScope, text);
topScope.put("expected", topScope, expected);
final String script = "var re = new RegExp(str, 'img');\n"
+ "var s = text.replace(re, '');\n"
+ "if (s != expected)"
+ " throw 'Expected >' + expected + '' + s + '";
ctx.evaluateString(topScope, script, "test", 0, null);
}
Too slow
The same regular expression used for larger texts show how slow Rhino’s RegExp support is.
Performing the replacement on the text from previous example repeated 100 times I get on my desktop:
Pure Rhino: 25 ms
String.replace using java.util.regex: 7 ms
and if I repeat it 1000 times this becomes even worse:
Pure Rhino: 440 ms
String.replace using java.util.regex: 15 ms
Quite impressive difference!
HtmlUnit’s first step to use java.util.regex
based JS RegExp
Ideally the Rhino RegExp support should be rewritten to use java.util.regex
. This will surely come in the future but it is not yet the case.
Luckily the String
functions that need regular expression don’t use the RegExp
functionalities directly but through a proxy that can be configured through ScriptRuntime.setRegExpProxy
. This is what