1. Tree equivalencies. Currently, /contend/ /content/ /resend/ /resent/ produces (?:conten[dt]|resend[dt]) but it is possible to produce (?:cont|res)en[dt] if one can spot the common tail nodes (and walk back the equivalent paths). Or be by me my => /[bm][ey]/ in the simplest case. To do this requires a certain amount of restructuring of the code. Currently, the algorithm uses a two-phase approach. In the first phase, the trie is traversed and reductions are performed. In the second phase, the reduced trie is traversed and the pattern is emitted. What has to occur is that the reduction and emission have to occur together. As a node is completed, it is replaced by its string representation. This then allows child nodes to be compared for equality with a simple 'eq'. Since there is only a single traversal, the overall generation time might drop, even though the context baggage required to delve through the tree will be more expensive to carry along (a hash rather than a couple of scalars). Actually, a simpler approach is to take on a secret sentinel atom at the end of every pattern, which gives the reduction algorithm sufficient traction to create a perfect trie. I'm rewriting the reduction code using this technique. 2. Investigate how (?>foo) works. Can it be applied? 5. How can a tracked pattern be serialised? (Add freeze and thaw methods). 6. Store callbacks per tracked pattern. 12. utf-8... hmmmm... 14. Adding qr//'ed patterns. For example, consider $r->add ( qr/^abc/i ) ->add( qr/^abd/ ) ->add( qr/^ab e/x ); this should admit abc abC aBc aBC abd abe as matches 16. Allow a fast, unsafe tracking mode, that can be used if a(?bc)? can't happen. (Possibly carp if it does appear during traversal)? 17. given a-\d+-\d+-\d+-\d+-b, produce a(?:-\d+){4}-b. Something along the lines of (.{4))(\1+) would let the regexp engine itself be brought to bear on the matter, which is a rather appealing idea. Consider while(/(?!\+)(\S{2,}?)(\1+)/g) { ... $1, $2 ... } as a starting point. 19. The reduction code has become unbelievably baroque. Adding code to handle (sting,singing,sing) => s(?:(?:ing)?|t)ing was far too difficult. Adding more stuff just breaks existing behaviour. And fixing the ^abcd$ ... bug broke stuff all over again. Now that the corner cases are more clearly identified, a full rewrite of the reduction code is needed. And would admit the possibility of implementing items 1 and 17. 20. Handle debug unrev with a separate bit 23. Japhy's http://www.perlmonks.org/index.pl?node_id=90876 list2range regexp 24. Lookahead assertions contain serious bugs (as shown by assembling powersets. Need to save more context during reduction, which in turn will simplify the preparation of the lookahead classes. See also 19. 26. _lex() swamps the overall run-time. It stems from the decision to use a single regexp to pull apart any pattern. A suite of simpler regexp to pick of parens, char classes, quantifiers and bare tokens may be faster. (This has been implemented as _fastlex(), but it's only marginally faster. Perhaps split-by- char and lex a la C? 27. We don't, as yet, unroll_plus a paren e.g. (abc)+? 28. We don't reroll unrolled a a* to a+ in indented or tracked output 29. Use (*MARK n) in blead for tracked patterns, and use (*FAIL) for the unmatchable pattern.