Revision history for Perl extension Regexp::Assemble. 0.35 2011-04-07 13:18:48 UTC - Update test suite to take into account the regexp engine changes for 5.14. No functional differences. 0.34 2008-06-17 20:20:14 UTC - Rewrite the usage of _re_sort() in order to deal with blead change #33874. Bug smoked out by Andreas König. 0.33 2008-06-07 14:40:57 UTC - Tweaked _fastlex() to fix bug #36399 spotted by Yves Blusseau ('a|[bc]' becomes 'a\|[bc]'). - Recognise POSIX character classes (e.g. [[:alpha:]]. Bug also spotted by Yves Blusseau (bug #36465). 0.32 2007-07-30 17:47:39 UTC - Backed out the change introduced in 0.25 (that created slimmer regexps when custom flags are used). As things stood, it meant that '/' could not appear in a pattern with flags (and could possibly dump core). Bug #28554 noted by David Morel. - Allow a+b to be unrolled into aa*b, as that may allow further reductions (bug #20847 noted by Philippe Bruhat). Not completely implemented, but bug #28554 is sufficient to push out a new release. - eg/assemble understands -U to enable plus unrollings. - Extended campaign of coverage improvements made to the test suite caught a minor flaw in source(). 0.31 2007-06-04 20:40:33 UTC - Add a fold_meta_pairs flag to control the behaviour of [\S\s] (and [\D\d], [\W\w]) being folded to '.' (bug #24171 spotted by Philippe Bruhat). 0.30 2007-05-18 15:39:37 UTC - Fixup _fastlex() bug in 5.6 (unable to discriminate \cX). This allows bug #27138 to be closed. 0.29 2007-05-17 10:48:42 UTC - Tracked patterns enhanced to take advantage of 5.10 (and works again with blead). - The mutable() functionality has been marked as deprecated. - mailing list web page was incorrect (noted by Kai Carver) 0.28 2006-11-26 21:49:26 UTC - Fixed a.+ bug (interpreted as a\.+) (bug #23623) - Handle /[a]/ => /a/ 0.27 2006-11-01 23:43:35 UTC - rewrote the lexing of patterns in _fastlex(). Unfortunately this doesn't speed things up as much as I had hoped. - eg/assemble now recognises -T to dump timing statistics. - file parameter in add_file() may accept a single scalar (or a list, as before). - rs parameter in new() was not recognised as an alias for input_record_separator, - anchor_string_absolute as a parameter to new() would not have worked correctly. - a couple of anchor_() methods would not have worked correctly. - Added MANIFEST.SKIP, now that the module is under version control. - Broke out the debug tests into a separate file (t/09_debug.t). - cmp_ok() tests that tested equality were replaced by is(). - tests in t/03_str.t transformed to a data-driven approach, in order to slim down the size of the distribution tarball. - Typo spotted in the documentation by Stephan (bug #20425). 0.26 2006-07-12 09:27:51 UTC - Incorporated a patch to the test suite from barbie, to work around a problem encountered on Win32 (bug #17507). - The "match nothing" pattern was incorrect (but so obscure as to be reasonably safe). - Removed the unguarded tests in t/06_general.t that the Test::More workaround in 0.24 skips. - Newer versions of Sub::Uplevel no longer need to be guarded against in t/07_warning.t. 0.25 2006-04-20 18:04:49 - Added a debug switch to elapsed pattern insertion and pattern reduction times. Upgraded eg/assemble to make use of it. - Tweaked the resulting pattern when it uses 'imsx' flags, giving (?i-xsm:(?:^a[bc]|de)) instead of (?-xism:(?i:(?:^a[bc]|de))) . - Changed the "match nothing" pattern to something slightly less unsurprising to those who peek behind the curtain. Reported by Philippe Bruhat (bug #18266). - Tweaked the dump() output for chars \x00 .. \x1f 0.24 2006-03-21 08:50:42 - Added an add_file() method that allows a file of patterns to be slurped into an object. Makes for less make-work code in the client code (and thus one less thing to go wrong there). - Added anchor methods that tack on \b, ^, $ and the like to an assembled pattern. - Rewrote new() and clone(). The latter is now no longer needs to know the attribute names. - _lex_stateful() subsumed into _lex() - \d and \w assemble to \w instead of [\w\d] (and similarly for \D and \W). - The Test::More workaround stated in the 0.23 changes didn't actually make it into t/06_general.t - Rewrote tests in 06_general.t to use like()/unlike() instead of ok(), and some more ok()'s replaced by cmp_ok() elsewhere. - Diagnostics for t/00_basic.t:xcmp was incorrect (displayed first param instead of second). - Guard against broken Sub::Uplevel in t/07_warning.t for perl 5.8.8. - Pretty-print characters [\x00-\x1f] in _dump() routines. - Spell-checked the POD! 0.23 2006-01-03 17:03:35 - More bugs in the reduction code shaken out by examining powersets. Exhaustive testing (iterating through the powerset of a, b, c, d, e) makes me think that the pathological cases are taken care of. The code is horrible, though, a rewrite is next on the agenda. - Guard against earlier buggy versions of Test::More (0.47) in t/06_general.t - Carp::croak rewritten as Carp::croak() to fix failures noted on blead. - Rewrote _re_path() for speed. - added lexstr() routine. - added eg/stress-test program. 0.22 2005-12-02 11:31:42 UTC - Amended the test suite to ensure that it runsh0orrectly under 5.005_04. (The documentation was updated to reflect the limitations). Sbastien Aperghis-Tramoni provided the impetus for this fix. No other changes in functionality. - The SKIP counts in t/06_general.t were out of whack for 5.6 and 5.005 testing. 0.21 2005-11-26 16:16:06 UTC - Fixed a nasty bug after generating a series of lists of patterns using Data::PowerSet: ^abc$ ^abcd$ ^ac$ ^acd$ ^b$ ^bc$ ^bcd$ ^bd$ would produce the incorrect ^b(?:(?:ab?)?c)?d?$ pattern. It should if fact produce the ^(?:ab?c|bc?)d?$ pattern. - Improve the reduction of, for example, 'sing', 'singing', 'sting'. In prior versions this would produce s(?:ing(?:ing)?|ting), now it produces s(?:(?:ing)?|t)ing. The code is a bit horrendous (especially the part at the end of _reduce_path). And it's still not perfect. See the TODO. - Duplicate pattern detection wasn't quite right. The code was lacking an else clause, which meant 'abcdef' followed by 'abc' would have the latter treated as a duplicate. - Now that there's a statistic that keeps track of when a duplicate input pattern was encountered, it becomes possible to let the user know about it. Two possibilities are available: a simple carp(), or a callback for complete control. The first time I tried this out on a real file of 3558 patterns, it found 9 dups (or rather, 8 dups and a bug in the module). - The above improvement means the test suite now requires Test::Warn. As a result, t/07_pod.t was subsumed into t/00_basic.t and t/07_warning.t was born. - Added an eg/ircwatcher script that demonstrates how to set up a dispatch table on a tracked regular expression. Credit to David Rigaudière for the idea. - Made sure all routines use an explicit return when it makes sense to do so. (I have a tendency to use implicit returns, which is evil). - the Carp module is require'ed on an on-demand basis. - eg/naive updated to bring its idea of $Single_Char in line with Assemble.pm. - Cleaned up typos and PODos in the documentation. Fixed minor typo noted by David Rigaudière. - Reworked as_string() and re() to play nicely with Devel::Cover, but alas, the module no longer runs under D::C at all. Something to do with the overloading of "" for re()? 0.20 2005-11-07 18:03:32 UTC - Fixed long-standing indent bug: $ra->add( 'a\.b' )->add( 'a-b' )->as_string(indent=>2) ... would produce a(?:\.|-b) instead of a[-.]b. - Fixed bug ($ and ^ not treated correctly). See RT ticket #15522. Basically, '^a' and 'ma' produced [m^]a instead of (?:^|m)a - Statistics! See the stats_* methods. - eg/assemble now has an -s switch to display these statistics - Minor tweak to t/02_reduce.t to get it to play nicely with Devel::Cover. - t/02_reduce.t had an unnecessary use Data::Dumper. 0.19 2005-11-02 15:16:16 UTC - Change croaking diagnostic concerning Default_Lexer. Bug spotted by barbie in ticket #15044. - Pointer to C in the documentation. - Excised Test::Deep probe in 00_basic.t, since the module is no longer used. - Detabbed eg/* 0.18 2005-10-08 20:37:53 UTC - Fixed '\Q[' to be as treated as '\[' instead of '['. What's more, the tests had this as the Right Thing. What was I thinking? Wound up rewriting _lex_stateful in a much less hairier way, even though it now uses gotos. - Introduced a context hash for dragging around the bits and pieces required by the descent into _reduce_path. It doesn't really help much right now, but is vital for solving the qw(be by my me) => /[bm][ey]/ problem. See TODO for more notes. - Fixed the debug output to play nicely with the test harness (by prefixing everything with a #). It had never been a problem, but you never know. - Added a script named 'debugging' to help people figure out why assembled patterns go wonky (which is invariably due to nested parentheses). - Added a script 'tld', that produces a regexp for matching internet Top Level Domain names. This happens to be an ideal example of showing how the alternations are sorted. - Added a script 'roman', that produces a regexp for matching Roman numerals. Just for fun. - Removed the 'assemble-check' script, whose functionality is adequately dealt with via 'assemble -t'. - Tightened up the explanation of why tracked patterns are bulkier - ISOfied the dates in this file. 0.17 2005-09-10 16:41:22 UTC - Add capture() method. - Restructure _insert_path(). - Factor out duplicated code introduced in 0.16 into _build_re(). - Ensure that the test suite exercises the fallback code path for when Storable is missing, even if Storable is available. - Added test_pod_coverage, merely to earn a free Kwalitee point. 0.16 2005-08-22 23:04:02 UTC - Tracked patterns silently ignored imsx flags. Spotted by Bart Lateur. 0.15 2005-04-27 06:50:31 UTC - Oops. Detabbed all the files and did not rerun the tests. t/03_str.t explicitly performs a test on a literal TAB character, and so it failed. Always, always, *ALWAYS* run the test suite as the last task before uploading. Grrr. 0.14 2005-04-27 00:32:43 UTC - Performance tuning release. Played around significantly with _insertr and lex but major improvement will only come about by writing the lexing routine in C. - Reordered $Default_Lexer to bring the most common cases to the front of the pattern. - Inline the effects of \U, \L, \c, \x. This is handled by _lex_stateful (which offloads some of the worst case lexing costs into a separate routine and thus makes the more usual cases run faster). Handling of \Q in the previous release was incorrect. (Sigh). - Backslash slashes. - Passed arrays around by reference between _lex and a newly introduced _insertr routine. - Silenced warning in _slide_tail (ran/reran) - Fixed bug in _slide_tail (didn't handle '0' as a token). One section of the code used to do its own sliding, now it uses _slide_tail. - Fixed bug in _node_eq revealed by 5.6.1 (implicit ordering of hash keys). - Optimized node_offset() - replace ok() in tests by better things (is, like, ...) - removed use of Test::Differences, since it doesn't work on complex structures. 0.13 2005-04-11 21:59:26 UTC - Deal with \Q...\E patterns. - $Default_Lexer pattern fails on 5.6.x: it would lex '\-' as '\', '-'. - Tests to prove that the global $_ is not clobbered by the module. - Used cmp_ok rather than ok where it makes sense. - Added a (belated) DEBUG_LEX debugging mode 0.12 2005-04-11 23:49:16 UTC - Forgot to guard against the possibility of Test::Differences not being available. This would cause erroneous failures in the test suite if it was not installed. - Quotemeta was still giving troubles. Exhaustive testing also turned up the fact that a bare add('0') would be ignored (and thus the null-match pattern would be returned. - More tweaks to the documentation. 0.11 Sat Apr 9 19:44:19 2005 UTC - Performed coverage testing with Devel::Cover Numerous tests added as a result. Borderline bugs fixed (bizarre copy of ARRAY in leave under D::C - fixed in 0.10). - Finalised the interface to using zero-width lookahead assertions. Depending on the match/failure ratio of the pattern to targets, the pattern execution may be slower with ZWLAs than without. Benchmark it. - Made _dump call _dump_node if passed a reference to a hash. This simplifies the code a bit, since one no longer has to worry about whether the thing we are looking at is a node or a path. All in all a minor patch, just to tidy up some loose ends before moving to heftier optimisations. - The fix in 0.10 for quotemeta didn't go far enough. Hopefully this version gets it right. - A number of minor tweaks based on information discovered during coverage testing. - Added documentation about the mailing list. Sundry documentation tweaks. 0.10 2005-03-29 09:01:49 UTC - Correct Default_Lexer$ pattern to deal with the excessively backslashed tokens that C likes to produce. Bug spotted by Walter Roberson. - Added a fix to an obscure bug that Devel::Cover uncovered. The next release will fold in similar improvements found by using Devel::Cover. 0.09 2005-01-22 9:28:21 UTC - Added lookahead assertions at nodes. (This concept is shamelessly pinched from Dan Kogai's Regexp::Optimizer). The code is currently commented out, because in all my benchmarks the resulting regexps are slower with them. Look for calls to _combine if you want to play around with this. - $Default_Lexer and $Single_Char regexps updated to fix a bug where backslashed characters were broken apart between the backslash and the character, resulting in uncompilable regexps. - Character classes are now sorted to the left of a list of alternations. - Corrected license info in META.yml - Started to switch from ok() to cmp_ok() in the test suite to produce human-readable test failures. 0.08 2005-01-03 11:23:50 UTC - Bug in insert_node fixed: did not deal with the following correctly: qw/bcktx bckx bdix bdktx bdkx/ (The assymetry introduced by 'bdix' threw things off, or something like that). - Bug in reduced regexp generation (reinstated code that had been excised from _re_path() et al). - Rewrote the tests to eliminate the need for Test::Deep. Test::More::is_deeply is sufficient. 0.07 2004-12-17 19:31:18 UTC - It would have been nice to have remembered to update the release date in the POD, and the version in the README. 0.06 2004-12-17 17:38:41 UTC - Can now track regular expressions. Given a match, it is possible to determine which original pattern gave rise to the match. - Improved character class generation: . (anychar) was not special-cased, which would have lead to a.b axb giving a[.x]b Also takes into account single-char width metachars like \t \e et al. Filters out digits if \d appears, and for similar metachars (\D, \s, \W...) - Added a pre_filter method, to perform input filtering prior to the pattern being lexed. - Added a flags method, to allow for (?imsx) pattern modifiers. - enhanced the assemble script: added -b, -c, -d, -v; documented -r - Additions to the README - Added Test::Simple and Test::More as prerequisites. 0.05 2004-12-10 11:52:13 UTC - Bug fix in tests. The skip test in version 0.04 did not deal correctly with non-5.6.0 perls that do not have Test::Deep installed. 0.04 2004-12-09 22:29:56 UTC - In 5.6.0, the backlashes in a quoted word list, qw[ \\d ], will have their backslashes doubled up. In this case, don't run the tests. (Reading from a file or getting input from some other source other than qw[] operators works just fine). 0.03 2004-12-08 21:55:27 UTC - Bug fix: Leading 0s could be omitted from paths because of the difference between while($p) versus while(defined($p)). - An assembled pattern can be generated with whitespace. This can be used in conjunction with the /x modifier, and also for debugging. - Code profiled: dead code paths removed, hotspots rewritten to run more quickly. - Documentation typos and wordos. - assemble script now accepts a number of command line switches to control its behaviour. - More tests. Now with Test::Pod. 0.02 2004-11-19 11:16:33 UTC - An R::A object that has had nothing added to it now produces a pattern that explicitly matches nothing (the original behaviour would match anything). - An object can now chomp its own input. Useful for slurping files. It can also filter the input tokens and discard patterns that don't adhere to what's expected (sanity checking e.g.: don't want spaces). - Documented and added functions to allow for the lexer pattern to be manipulated. - The reset() method was commented out (and the test suite didn't catch the fact). - Detabbed the Assemble.pm, eg/* and t/* files (I like interpreting tabs as four spaces, but this produces horrible indentation on www.cpan.org). - t/00_basic.t test counts were wrong. This showed up if Test::Deep was not installed. - t/02_reduce.t does not need to 'use Data::Dumper'. - Tweaked eg/hostmatch/hostmatch; added eg/assemble, eg/assemble-check - Typos, corrections and addtions to the documentation. 0.01 2004-07-09 21:05:18 UTC - original version; created by h2xs 1.19 (seriously!)