This articles was published on 2013-02-19

Pluggable RegExp

The last couple of weeks were quite exciting from a RegExp point of view. The first big thing was mattn’s pull request of the IIJ RegExp Engine which is based on Oniguruma. This patch would have made the same RegExp Engine from the MRI also available in mruby. After one month of discussion and work on this patch the final decision was to close this ticket. This sounds sad but actually this is really good news, due to the reason that instead we decided to build a pluggable RegExp engine into mruby.

Abstract

Regular Expressions are a powerful way to match text in a string. In Ruby you can do things like:

"I've got 3 Rubies from the IIJ, 2 from mura and 5 from mattn.".scan(/d/)
=> ["3", "2", "5"]

Where d is a regular expression and finds every decimal digit in our string above. If you want to do string manipulation, RegExp is a very convenient way to do it.

Till now mruby didn’t had a fully working RegExp engine integrated. There was only some left-overs available which were copied from the MRI. Based on these left-overs and the work from the IIJ team, mattn created a fully working RegExp engine implementation for the latest mruby head. Around the same time Masamitsu Murase built a mruby GEM for the Henry Spencer RegExp engine.

Due to the reason that we like diversity, I suggested a Library independent RegExp engine in mruby. Mattn immediately jumped on and made a new pull request which removed more or less all left-overs of the Oniguruma RegExp engine out of the mruby head. At the same time he modified the code in a way that a RegExp engine could easily be integrated as a GEM.

In this state it is now possible that people can build their own RegExp engine as a GEM and use it as they like. At the moment we have two such RegExp GEMs available.

OnigRegexp

Obviously mattn didn’t throw away the IIJ code. Instead he gemified his afford and created the mruby-onig-regexp GEM. This one is supposed to provide the full-feature Oniguruma engine to mruby.

HsRegexp

As already mentioned the second candidate is mruby-hs-regexp from Masamitsu Murase. A RegExp engine based on the Henry Spencer code which is suppose to provide a reduced set of RegExp features by keeping also a smaller footprint in mruby.

State

At the moment both RegExp engines kind of work. There is still work to be done to comply to the ISO Ruby standard and to integrate it even better into the mruby syntax. We can also expect that there are bugs available. So the best thing you can do is to grab the code and experiment with it.

When we have reached stability, I would expect that we decide for a default RegExp engine shipped with mruby. Just due to the reason that this part is so essential to Ruby development. But with the current pluggable system it will be a joy for you to remove this engine or exchange it with another one (which has more features or which provides less overhead).

Have fun!