Improve Rhino’s RegExp using Java’s java.util.regex

Mozilla Rhino doesn’t use the facilities of java.util.regex to implement regular expression support in JavaScript but provides own regular expression implementation. Sadly this implementation is both buggy (see bug 369860) and slow.

Bug with non capturing groups
Here is a variation of bug 369860 occurring in String.replace.

The regular expression used is really “real life” stuff as it is what the prototype library uses to filter scripts in the text of received XMLHttpRequest.


public void testBuggyReplace()
{
  final Context ctx = Context.enter();
  final ScriptableObject topScope = ctx.initStandardObjects();
  final String text = "<b>bla</b><script>alert(123);</script>bla";
  final String regex = "(?:<script.*?>)((\\n|\\r|.)*?)(?:<\\/script>)";
  final String expected = "<b>bla</b>bla";
  assertEquals(expected, text.replaceAll(regex, "")); // check replacement
  topScope.put("str", topScope, regex);
  topScope.put("text", topScope, text);
  topScope.put("expected", topScope, expected);
  final String script = "var re = new RegExp(str, 'img');\n"
    + "var s = text.replace(re, '');\n"
    + "if (s != expected)"
    + " throw 'Expected >' + expected + '' + s + '";
  ctx.evaluateString(topScope, script, "test", 0, null);
}

Too slow
The same regular expression used for larger texts show how slow Rhino’s RegExp support is.

Performing the replacement on the text from previous example repeated 100 times I get on my desktop:

Pure Rhino: 25 ms
String.replace using java.util.regex: 7 ms

and if I repeat it 1000 times this becomes even worse:

Pure Rhino: 440 ms
String.replace using java.util.regex: 15 ms

Quite impressive difference!

HtmlUnit’s first step to use java.util.regex based JS RegExp
Ideally the Rhino RegExp support should be rewritten to use java.util.regex. This will surely come in the future but it is not yet the case.
Luckily the String functions that need regular expression don’t use the RegExp functionalities directly but through a proxy that can be configured through ScriptRuntime.setRegExpProxy. This is what

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: