========== Change Log ========== Version 2.4.0 - April, 2019 --------------------------- - Well, it looks like the API change that was introduced in 2.3.1 was more drastic than expected, so for a friendlier forward upgrade path, this release: . Bumps the current version number to 2.4.0, to reflect this incompatible change. . Adds a pyparsing.__compat__ object for specifying compatibility with future breaking changes. . Conditionalizes the API-breaking behavior, based on the value pyparsing.__compat__.collect_all_And_tokens. By default, this value will be set to True, reflecting the new bugfixed behavior. To set this value to False, add to your code: import pyparsing pyparsing.__compat__.collect_all_And_tokens = False . User code that is dependent on the pre-bugfix behavior can restore it by setting this value to False. In 2.5 and later versions, the conditional code will be removed and setting the flag to True or False in these later versions will have no effect. - Updated unitTests.py and simple_unit_tests.py to be compatible with "python setup.py test". To run tests using setup, do: python setup.py test python setup.py test -s unitTests.suite python setup.py test -s simple_unit_tests.suite Prompted by issue #83 and PR submitted by bdragon28, thanks. - Fixed bug in runTests handling '\n' literals in quoted strings. - Added tag_body attribute to the start tag expressions generated by makeHTMLTags, so that you can avoid using SkipTo to roll your own tag body expression: a, aEnd = pp.makeHTMLTags('a') link = a + a.tag_body("displayed_text") + aEnd for t in s.searchString(html_page): print(t.displayed_text, '->', t.startA.href) - indentedBlock failure handling was improved; PR submitted by TMiguelT, thanks! - Address Py2 incompatibility in simpleUnitTests, plus explain() and Forward str() cleanup; PRs graciously provided by eswald. - Fixed docstring with embedded '\w', which creates SyntaxWarnings in Py3.8, issue #80. - Examples: - Added example parser for rosettacode.org tutorial compiler. - Added example to show how an HTML table can be parsed into a collection of Python lists or dicts, one per row. - Updated SimpleSQL.py example to handle nested selects, reworked 'where' expression to use infixNotation. - Added include_preprocessor.py, similar to macroExpander.py. - Examples using makeHTMLTags use new tag_body expression when retrieving a tag's body text. - Updated examples that are runnable as unit tests: python setup.py test -s examples.antlr_grammar_tests python setup.py test -s examples.test_bibparse Version 2.3.1 - January, 2019 ----------------------------- - POSSIBLE API CHANGE: this release fixes a bug when results names were attached to a MatchFirst or Or object containing an And object. Previously, a results name on an And object within an enclosing MatchFirst or Or could return just the first token in the And. Now, all the tokens matched by the And are correctly returned. This may result in subtle changes in the tokens returned if you have this condition in your pyparsing scripts. - New staticmethod ParseException.explain() to help diagnose parse exceptions by showing the failing input line and the trace of ParserElements in the parser leading up to the exception. explain() returns a multiline string listing each element by name. (This is still an experimental method, and the method signature and format of the returned string may evolve over the next few releases.) Example: # define a parser to parse an integer followed by an # alphabetic word expr = pp.Word(pp.nums).setName("int") + pp.Word(pp.alphas).setName("word") try: # parse a string with a numeric second value instead of alpha expr.parseString("123 355") except pp.ParseException as pe: print(pp.ParseException.explain(pe)) Prints: 123 355 ^ ParseException: Expected word (at char 4), (line:1, col:5) __main__.ExplainExceptionTest pyparsing.And - {int word} pyparsing.Word - word explain() will accept any exception type and will list the function names and parse expressions in the stack trace. This is especially useful when an exception is raised in a parse action. Note: explain() is only supported under Python 3. - Fix bug in dictOf which could match an empty sequence, making it infinitely loop if wrapped in a OneOrMore. - Added unicode sets to pyparsing_unicode for Latin-A and Latin-B ranges. - Added ability to define custom unicode sets as combinations of other sets using multiple inheritance. class Turkish_set(pp.pyparsing_unicode.Latin1, pp.pyparsing_unicode.LatinA): pass turkish_word = pp.Word(Turkish_set.alphas) - Updated state machine import examples, with state machine demos for: . traffic light . library book checkin/checkout . document review/approval In the traffic light example, you can use the custom 'statemachine' keyword to define the states for a traffic light, and have the state classes auto-generated for you: statemachine TrafficLightState: Red -> Green Green -> Yellow Yellow -> Red Similar for state machines with named transitions, like the library book state example: statemachine LibraryBookState: New -(shelve)-> Available Available -(reserve)-> OnHold OnHold -(release)-> Available Available -(checkout)-> CheckedOut CheckedOut -(checkin)-> Available Once the classes are defined, then additional Python code can reference those classes to add class attributes, instance methods, etc. See the examples in examples/statemachine - Added an example parser for the decaf language. This language is used in CS compiler classes in many colleges and universities. - Fixup of docstrings to Sphinx format, inclusion of test files in the source package, and convert markdown to rst throughout the distribution, great job by Matěj Cepl! - Expanded the whitespace characters recognized by the White class to include all unicode defined spaces. Suggested in Issue #51 by rtkjbillo. - Added optional postParse argument to ParserElement.runTests() to add a custom callback to be called for test strings that parse successfully. Useful for running tests that do additional validation or processing on the parsed results. See updated chemicalFormulas.py example. - Removed distutils fallback in setup.py. If installing the package fails, please update to the latest version of setuptools. Plus overall project code cleanup (CRLFs, whitespace, imports, etc.), thanks Jon Dufresne! - Fix bug in CaselessKeyword, to make its behavior consistent with Keyword(caseless=True). Fixes Issue #65 reported by telesphore. Version 2.3.0 - October, 2018 ----------------------------- - NEW SUPPORT FOR UNICODE CHARACTER RANGES This release introduces the pyparsing_unicode namespace class, defining a series of language character sets to simplify the definition of alphas, nums, alphanums, and printables in the following language sets: . Arabic . Chinese . Cyrillic . Devanagari . Greek . Hebrew . Japanese (including Kanji, Katakana, and Hirigana subsets) . Korean . Latin1 (includes 7 and 8-bit Latin characters) . Thai . CJK (combination of Chinese, Japanese, and Korean sets) For example, your code can define words using: korean_word = Word(pyparsing_unicode.Korean.alphas) See their use in the updated examples greetingInGreek.py and greetingInKorean.py. This namespace class also offers access to these sets using their unicode identifiers. - POSSIBLE API CHANGE: Fixed bug where a parse action that explicitly returned the input ParseResults could add another nesting level in the results if the current expression had a results name. vals = pp.OneOrMore(pp.pyparsing_common.integer)("int_values") def add_total(tokens): tokens['total'] = sum(tokens) return tokens # this line can be removed vals.addParseAction(add_total) print(vals.parseString("244 23 13 2343").dump()) Before the fix, this code would print (note the extra nesting level): [244, 23, 13, 2343] - int_values: [244, 23, 13, 2343] - int_values: [244, 23, 13, 2343] - total: 2623 - total: 2623 With the fix, this code now prints: [244, 23, 13, 2343] - int_values: [244, 23, 13, 2343] - total: 2623 This fix will change the structure of ParseResults returned if a program defines a parse action that returns the tokens that were sent in. This is not necessary, and statements like "return tokens" in the example above can be safely deleted prior to upgrading to this release, in order to avoid the bug and get the new behavior. Reported by seron in Issue #22, nice catch! - POSSIBLE API CHANGE: Fixed a related bug where a results name erroneously created a second level of hierarchy in the returned ParseResults. The intent for accumulating results names into ParseResults is that, in the absence of Group'ing, all names get merged into a common namespace. This allows us to write: key_value_expr = (Word(alphas)("key") + '=' + Word(nums)("value")) result = key_value_expr.parseString("a = 100") and have result structured as {"key": "a", "value": "100"} instead of [{"key": "a"}, {"value": "100"}]. However, if a named expression is used in a higher-level non-Group expression that *also* has a name, a false sub-level would be created in the namespace: num = pp.Word(pp.nums) num_pair = ("[" + (num("A") + num("B"))("values") + "]") U = num_pair.parseString("[ 10 20 ]") print(U.dump()) Since there is no grouping, "A", "B", and "values" should all appear at the same level in the results, as: ['[', '10', '20', ']'] - A: '10' - B: '20' - values: ['10', '20'] Instead, an extra level of "A" and "B" show up under "values": ['[', '10', '20', ']'] - A: '10' - B: '20' - values: ['10', '20'] - A: '10' - B: '20' This bug has been fixed. Now, if this hierarchy is desired, then a Group should be added: num_pair = ("[" + pp.Group(num("A") + num("B"))("values") + "]") Giving: ['[', ['10', '20'], ']'] - values: ['10', '20'] - A: '10' - B: '20' But in no case should "A" and "B" appear in multiple levels. This bug-fix fixes that. If you have current code which relies on this behavior, then add or remove Groups as necessary to get your intended results structure. Reported by Athanasios Anastasiou. - IndexError's raised in parse actions will get explicitly reraised as ParseExceptions that wrap the original IndexError. Since IndexError sometimes occurs as part of pyparsing's normal parsing logic, IndexErrors that are raised during a parse action may have gotten silently reinterpreted as parsing errors. To retain the information from the IndexError, these exceptions will now be raised as ParseExceptions that reference the original IndexError. This wrapping will only be visible when run under Python3, since it emulates "raise ... from ..." syntax. Addresses Issue #4, reported by guswns0528. - Added Char class to simplify defining expressions of a single character. (Char("abc") is equivalent to Word("abc", exact=1)) - Added class PrecededBy to perform lookbehind tests. PrecededBy is used in the same way as FollowedBy, passing in an expression that must occur just prior to the current parse location. For fixed-length expressions like a Literal, Keyword, Char, or a Word with an `exact` or `maxLen` length given, `PrecededBy(expr)` is sufficient. For varying length expressions like a Word with no given maximum length, `PrecededBy` must be constructed with an integer `retreat` argument, as in `PrecededBy(Word(alphas, nums), retreat=10)`, to specify the maximum number of characters pyparsing must look backward to make a match. pyparsing will check all the values from 1 up to retreat characters back from the current parse location. When stepping backwards through the input string, PrecededBy does *not* skip over whitespace. PrecededBy can be created with a results name so that, even though it always returns an empty parse result, the result *can* include named results. Idea first suggested in Issue #30 by Freakwill. - Updated FollowedBy to accept expressions that contain named results, so that results names defined in the lookahead expression will be returned, even though FollowedBy always returns an empty list. Inspired by the same feature implemented in PrecededBy. Version 2.2.2 - September, 2018 ------------------------------- - Fixed bug in SkipTo, if a SkipTo expression that was skipping to an expression that returned a list (such as an And), and the SkipTo was saved as a named result, the named result could be saved as a ParseResults - should always be saved as a string. Issue #28, reported by seron. - Added simple_unit_tests.py, as a collection of easy-to-follow unit tests for various classes and features of the pyparsing library. Primary intent is more to be instructional than actually rigorous testing. Complex tests can still be added in the unitTests.py file. - New features added to the Regex class: - optional asGroupList parameter, returns all the capture groups as a list - optional asMatch parameter, returns the raw re.match result - new sub(repl) method, which adds a parse action calling re.sub(pattern, repl, parsed_result). Simplifies creating Regex expressions to be used with transformString. Like re.sub, repl may be an ordinary string (similar to using pyparsing's replaceWith), or may contain references to capture groups by group number, or may be a callable that takes an re match group and returns a string. For instance: expr = pp.Regex(r"([Hh]\d):\s*(.*)").sub(r"<\1>\2") expr.transformString("h1: This is the title") will return

This is the title

- Fixed omission of LICENSE file in source tarball, also added CODE_OF_CONDUCT.md per GitHub community standards. Version 2.2.1 - September, 2018 ------------------------------- - Applied changes necessary to migrate hosting of pyparsing source over to GitHub. Many thanks for help and contributions from hugovk, jdufresne, and cngkaygusuz among others through this transition, sorry it took me so long! - Fixed import of collections.abc to address DeprecationWarnings in Python 3.7. - Updated oc.py example to support function calls in arithmetic expressions; fixed regex for '==' operator; and added packrat parsing. Raised on the pyparsing wiki by Boris Marin, thanks! - Fixed bug in select_parser.py example, group_by_terms was not reported. Reported on SF bugs by Adam Groszer, thanks Adam! - Added "Getting Started" section to the module docstring, to guide new users to the most common starting points in pyparsing's API. - Fixed bug in Literal and Keyword classes, which erroneously raised IndexError instead of ParseException. Version 2.2.0 - March, 2017 --------------------------- - Bumped minor version number to reflect compatibility issues with OneOrMore and ZeroOrMore bugfixes in 2.1.10. (2.1.10 fixed a bug that was introduced in 2.1.4, but the fix could break code written against 2.1.4 - 2.1.9.) - Updated setup.py to address recursive import problems now that pyparsing is part of 'packaging' (used by setuptools). Patch submitted by Joshua Root, much thanks! - Fixed KeyError issue reported by Yann Bizeul when using packrat parsing in the Graphite time series database, thanks Yann! - Fixed incorrect usages of '\' in literals, as described in https://docs.python.org/3/whatsnew/3.6.html#deprecated-python-behavior Patch submitted by Ville Skyttä - thanks! - Minor internal change when using '-' operator, to be compatible with ParserElement.streamline() method. - Expanded infixNotation to accept a list or tuple of parse actions to attach to an operation. - New unit test added for dill support for storing pyparsing parsers. Ordinary Python pickle can be used to pickle pyparsing parsers as long as they do not use any parse actions. The 'dill' module is an extension to pickle which *does* support pickling of attached parse actions. Version 2.1.10 - October, 2016 ------------------------------- - Fixed bug in reporting named parse results for ZeroOrMore expressions, thanks Ethan Nash for reporting this! - Fixed behavior of LineStart to be much more predictable. LineStart can now be used to detect if the next parse position is col 1, factoring in potential leading whitespace (which would cause LineStart to fail). Also fixed a bug in col, which is used in LineStart, where '\n's were erroneously considered to be column 1. - Added support for multiline test strings in runTests. - Fixed bug in ParseResults.dump when keys were not strings. Also changed display of string values to show them in quotes, to help distinguish parsed numeric strings from parsed integers that have been converted to Python ints. Version 2.1.9 - September, 2016 ------------------------------- - Added class CloseMatch, a variation on Literal which matches "close" matches, that is, strings with at most 'n' mismatching characters. - Fixed bug in Keyword.setDefaultKeywordChars(), reported by Kobayashi Shinji - nice catch, thanks! - Minor API change in pyparsing_common. Renamed some of the common expressions to PEP8 format (to be consistent with the other pyparsing_common expressions): . signedInteger -> signed_integer . sciReal -> sci_real Also, in trying to stem the API bloat of pyparsing, I've copied some of the global expressions and helper parse actions into pyparsing_common, with the originals to be deprecated and removed in a future release: . commaSeparatedList -> pyparsing_common.comma_separated_list . upcaseTokens -> pyparsing_common.upcaseTokens . downcaseTokens -> pyparsing_common.downcaseTokens (I don't expect any other expressions, like the comment expressions, quotedString, or the Word-helping strings like alphas, nums, etc. to migrate to pyparsing_common - they are just too pervasive. As for the PEP8 vs camelCase naming, all the expressions are PEP8, while the parse actions in pyparsing_common are still camelCase. It's a small step - when pyparsing 3.0 comes around, everything will change to PEP8 snake case.) - Fixed Python3 compatibility bug when using dict keys() and values() in ParseResults.getName(). - After some prodding, I've reworked the unitTests.py file for pyparsing over the past few releases. It uses some variations on unittest to handle my testing style. The test now: . auto-discovers its test classes (while maintining their order of definition) . suppresses voluminous 'print' output for tests that pass Version 2.1.8 - August, 2016 ---------------------------- - Fixed issue in the optimization to _trim_arity, when the full stacktrace is retrieved to determine if a TypeError is raised in pyparsing or in the caller's parse action. Code was traversing the full stacktrace, and potentially encountering UnicodeDecodeError. - Fixed bug in ParserElement.inlineLiteralsUsing, causing infinite loop with Suppress. - Fixed bug in Each, when merging named results from multiple expressions in a ZeroOrMore or OneOrMore. Also fixed bug when ZeroOrMore expressions were erroneously treated as required expressions in an Each expression. - Added a few more inline doc examples. - Improved use of runTests in several example scripts. Version 2.1.7 - August, 2016 ---------------------------- - Fixed regression reported by Andrea Censi (surfaced in PyContracts tests) when using ParseSyntaxExceptions (raised when using operator '-') with packrat parsing. - Minor fix to oneOf, to accept all iterables, not just space-delimited strings and lists. (If you have a list or set of strings, it is not necessary to concat them using ' '.join to pass them to oneOf, oneOf will accept the list or set or generator directly.) Version 2.1.6 - August, 2016 ---------------------------- - *Major packrat upgrade*, inspired by patch provided by Tal Einat - many, many, thanks to Tal for working on this! Tal's tests show faster parsing performance (2X in some tests), *and* memory reduction from 3GB down to ~100MB! Requires no changes to existing code using packratting. (Uses OrderedDict, available in Python 2.7 and later. For Python 2.6 users, will attempt to import from ordereddict backport. If not present, will implement pure-Python Fifo dict.) - Minor API change - to better distinguish between the flexible numeric types defined in pyparsing_common, I've changed "numeric" (which parsed numbers of different types and returned int for ints, float for floats, etc.) and "number" (which parsed numbers of int or float type, and returned all floats) to "number" and "fnumber" respectively. I hope the "f" prefix of "fnumber" will be a better indicator of its internal conversion of parsed values to floats, while the generic "number" is similar to the flexible number syntax in other languages. Also fixed a bug in pyparsing_common.numeric (now renamed to pyparsing_common.number), integers were parsed and returned as floats instead of being retained as ints. - Fixed bug in upcaseTokens and downcaseTokens introduced in 2.1.5, when the parse action was used in conjunction with results names. Reported by Steven Arcangeli from the dql project, thanks for your patience, Steven! - Major change to docs! After seeing some comments on reddit about general issue with docs of Python modules, and thinking that I'm a little overdue in doing some doc tuneup on pyparsing, I decided to following the suggestions of the redditor and add more inline examples to the pyparsing reference documentation. I hope this addition will clarify some of the more common questions people have, especially when first starting with pyparsing/Python. - Deprecated ParseResults.asXML. I've never been too happy with this method, and it usually forces some unnatural code in the parsers in order to get decent tag names. The amount of guesswork that asXML has to do to try to match names with values should have been a red flag from day one. If you are using asXML, you will need to implement your own ParseResults->XML serialization. Or consider migrating to a more current format such as JSON (which is very easy to do: results_as_json = json.dumps(parse_result.asDict()) Hopefully, when I remove this code in a future version, I'll also be able to simplify some of the craziness in ParseResults, which IIRC was only there to try to make asXML work. - Updated traceParseAction parse action decorator to show the repr of the input and output tokens, instead of the str format, since str has been simplified to just show the token list content. (The change to ParseResults.__str__ occurred in pyparsing 2.0.4, but it seems that didn't make it into the release notes - sorry! Too many users, especially beginners, were confused by the "([token_list], {names_dict})" str format for ParseResults, thinking they were getting a tuple containing a list and a dict. The full form can be seen if using repr().) For tracing tokens in and out of parse actions, the more complete repr form provides important information when debugging parse actions. Verison 2.1.5 - June, 2016 ------------------------------ - Added ParserElement.split() generator method, similar to re.split(). Includes optional arguments maxsplit (to limit the number of splits), and includeSeparators (to include the separating matched text in the returned output, default=False). - Added a new parse action construction helper tokenMap, which will apply a function and optional arguments to each element in a ParseResults. So this parse action: def lowercase_all(tokens): return [str(t).lower() for t in tokens] OneOrMore(Word(alphas)).setParseAction(lowercase_all) can now be written: OneOrMore(Word(alphas)).setParseAction(tokenMap(str.lower)) Also simplifies writing conversion parse actions like: integer = Word(nums).setParseAction(lambda t: int(t[0])) to just: integer = Word(nums).setParseAction(tokenMap(int)) If additional arguments are necessary, they can be included in the call to tokenMap, as in: hex_integer = Word(hexnums).setParseAction(tokenMap(int, 16)) - Added more expressions to pyparsing_common: . IPv4 and IPv6 addresses (including long, short, and mixed forms of IPv6) . MAC address . ISO8601 date and date time strings (with named fields for year, month, etc.) . UUID (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) . hex integer (returned as int) . fraction (integer '/' integer, returned as float) . mixed integer (integer '-' fraction, or just fraction, returned as float) . stripHTMLTags (parse action to remove tags from HTML source) . parse action helpers convertToDate and convertToDatetime to do custom parse time conversions of parsed ISO8601 strings - runTests now returns a two-tuple: success if all tests succeed, and an output list of each test and its output lines. - Added failureTests argument (default=False) to runTests, so that tests can be run that are expected failures, and runTests' success value will return True only if all tests *fail* as expected. Also, parseAll now defaults to True. - New example numerics.py, shows samples of parsing integer and real numbers using locale-dependent formats: 4.294.967.295,000 4 294 967 295,000 4,294,967,295.000 Version 2.1.4 - May, 2016 ------------------------------ - Split out the '==' behavior in ParserElement, now implemented as the ParserElement.matches() method. Using '==' for string test purposes will be removed in a future release. - Expanded capabilities of runTests(). Will now accept embedded comments (default is Python style, leading '#' character, but customizable). Comments will be emitted along with the tests and test output. Useful during test development, to create a test string consisting only of test case description comments separated by blank lines, and then fill in the test cases. Will also highlight ParseFatalExceptions with "(FATAL)". - Added a 'pyparsing_common' class containing common/helpful little expressions such as integer, float, identifier, etc. I used this class as a sort of embedded namespace, to contain these helpers without further adding to pyparsing's namespace bloat. - Minor enhancement to traceParseAction decorator, to retain the parse action's name for the trace output. - Added optional 'fatal' keyword arg to addCondition, to indicate that a condition failure should halt parsing immediately. Version 2.1.3 - May, 2016 ------------------------------ - _trim_arity fix in 2.1.2 was very version-dependent on Py 3.5.0. Now works for Python 2.x, 3.3, 3.4, 3.5.0, and 3.5.1 (and hopefully beyond). Version 2.1.2 - May, 2016 ------------------------------ - Fixed bug in _trim_arity when pyparsing code is included in a PyInstaller, reported by maluwa. - Fixed catastrophic regex backtracking in implementation of the quoted string expressions (dblQuotedString, sglQuotedString, and quotedString). Reported on the pyparsing wiki by webpentest, good catch! (Also tuned up some other expressions susceptible to the same backtracking problem, such as cStyleComment, cppStyleComment, etc.) Version 2.1.1 - March, 2016 --------------------------- - Added support for assigning to ParseResults using slices. - Fixed bug in ParseResults.toDict(), in which dict values were always converted to dicts, even if they were just unkeyed lists of tokens. Reported on SO by Gerald Thibault, thanks Gerald! - Fixed bug in SkipTo when using failOn, reported by robyschek, thanks! - Fixed bug in Each introduced in 2.1.0, reported by AND patch and unit test submitted by robyschek, well done! - Removed use of functools.partial in replaceWith, as this creates an ambiguous signature for the generated parse action, which fails in PyPy. Reported by Evan Hubinger, thanks Evan! - Added default behavior to QuotedString to convert embedded '\t', '\n', etc. characters to their whitespace counterparts. Found during Q&A exchange on SO with Maxim. Version 2.1.0 - February, 2016 ------------------------------ - Modified the internal _trim_arity method to distinguish between TypeError's raised while trying to determine parse action arity and those raised within the parse action itself. This will clear up those confusing "() takes exactly 1 argument (0 given)" error messages when there is an actual TypeError in the body of the parse action. Thanks to all who have raised this issue in the past, and most recently to Michael Cohen, who sent in a proposed patch, and got me to finally tackle this problem. - Added compatibility for pickle protocols 2-4 when pickling ParseResults. In Python 2.x, protocol 0 was the default, and protocol 2 did not work. In Python 3.x, protocol 3 is the default, so explicitly naming protocol 0 or 1 was required to pickle ParseResults. With this release, all protocols 0-4 are supported. Thanks for reporting this on StackOverflow, Arne Wolframm, and for providing a nice simple test case! - Added optional 'stopOn' argument to ZeroOrMore and OneOrMore, to simplify breaking on stop tokens that would match the repetition expression. It is a common problem to fail to look ahead when matching repetitive tokens if the sentinel at the end also matches the repetition expression, as when parsing "BEGIN aaa bbb ccc END" with: "BEGIN" + OneOrMore(Word(alphas)) + "END" Since "END" matches the repetition expression "Word(alphas)", it will never get parsed as the terminating sentinel. Up until now, this has to be resolved by the user inserting their own negative lookahead: "BEGIN" + OneOrMore(~Literal("END") + Word(alphas)) + "END" Using stopOn, they can more easily write: "BEGIN" + OneOrMore(Word(alphas), stopOn="END") + "END" The stopOn argument can be a literal string or a pyparsing expression. Inspired by a question by Lamakaha on StackOverflow (and many previous questions with the same negative-lookahead resolution). - Added expression names for many internal and builtin expressions, to reduce name and error message overhead during parsing. - Converted helper lambdas to functions to refactor and add docstring support. - Fixed ParseResults.asDict() to correctly convert nested ParseResults values to dicts. - Cleaned up some examples, fixed typo in fourFn.py identified by aristotle2600 on reddit. - Removed keepOriginalText helper method, which was deprecated ages ago. Superceded by originalTextFor. - Same for the Upcase class, which was long ago deprecated and replaced with the upcaseTokens method. Version 2.0.7 - December, 2015 ------------------------------ - Simplified string representation of Forward class, to avoid memory and performance errors while building ParseException messages. Thanks, Will McGugan, Andrea Censi, and Martijn Vermaat for the bug reports and test code. - Cleaned up additional issues from enhancing the error messages for Or and MatchFirst, handling Unicode values in expressions. Fixes Unicode encoding issues in Python 2, thanks to Evan Hubinger for the bug report. - Fixed implementation of dir() for ParseResults - was leaving out all the defined methods and just adding the custom results names. - Fixed bug in ignore() that was introduced in pyparsing 1.5.3, that would not accept a string literal as the ignore expression. - Added new example parseTabularData.py to illustrate parsing of data formatted in columns, with detection of empty cells. - Updated a number of examples to more current Python and pyparsing forms. Version 2.0.6 - November, 2015 ------------------------------ - Fixed a bug in Each when multiple Optional elements are present. Thanks for reporting this, whereswalden on SO. - Fixed another bug in Each, when Optional elements have results names or parse actions, reported by Max Rothman - thank you, Max! - Added optional parseAll argument to runTests, whether tests should require the entire input string to be parsed or not (similar to parseAll argument to parseString). Plus a little neaten-up of the output on Python 2 (no stray ()'s). - Modified exception messages from MatchFirst and Or expressions. These were formerly misleading as they would only give the first or longest exception mismatch error message. Now the error message includes all the alternatives that were possible matches. Originally proposed by a pyparsing user, but I've lost the email thread - finally figured out a fairly clean way to do this. - Fixed a bug in Or, when a parse action on an alternative raises an exception, other potentially matching alternatives were not always tried. Reported by TheVeryOmni on the pyparsing wiki, thanks! - Fixed a bug to dump() introduced in 2.0.4, where list values were shown in duplicate. Version 2.0.5 - October, 2015 ----------------------------- - (&$(@#&$(@!!!! Some "print" statements snuck into pyparsing v2.0.4, breaking Python 3 compatibility! Fixed. Reported by jenshn, thanks! Version 2.0.4 - October, 2015 ----------------------------- - Added ParserElement.addCondition, to simplify adding parse actions that act primarily as filters. If the given condition evaluates False, pyparsing will raise a ParseException. The condition should be a method with the same method signature as a parse action, but should return a boolean. Suggested by Victor Porton, nice idea Victor, thanks! - Slight mod to srange to accept unicode literals for the input string, such as "[а-яА-Я]" instead of "[\u0430-\u044f\u0410-\u042f]". Thanks to Alexandr Suchkov for the patch! - Enhanced implementation of replaceWith. - Fixed enhanced ParseResults.dump() method when the results consists only of an unnamed array of sub-structure results. Reported by Robin Siebler, thanks for your patience and persistence, Robin! - Fixed bug in fourFn.py example code, where pi and e were defined using CaselessLiteral instead of CaselessKeyword. This was not a problem until adding a new function 'exp', and the leading 'e' of 'exp' was accidentally parsed as the mathematical constant 'e'. Nice catch, Tom Grydeland - thanks! - Adopt new-fangled Python features, like decorators and ternary expressions, per suggestions from Williamzjc - thanks William! (Oh yeah, I'm not supporting Python 2.3 with this code any more...) Plus, some additional code fixes/cleanup - thanks again! - Added ParserElement.runTests, a little test bench for quickly running an expression against a list of sample input strings. Basically, I got tired of writing the same test code over and over, and finally added it as a test point method on ParserElement. - Added withClass helper method, a simplified version of withAttribute for the common but annoying case when defining a filter on a div's class - made difficult because 'class' is a Python reserved word. Version 2.0.3 - October, 2014 ----------------------------- - Fixed escaping behavior in QuotedString. Formerly, only quotation marks (or characters designated as quotation marks in the QuotedString constructor) would be escaped. Now all escaped characters will be escaped, and the escaping backslashes will be removed. - Fixed regression in ParseResults.pop() - pop() was pretty much broken after I added *improvements* in 2.0.2. Reported by Iain Shelvington, thanks Iain! - Fixed bug in And class when initializing using a generator. - Enhanced ParseResults.dump() method to list out nested ParseResults that are unnamed arrays of sub-structures. - Fixed UnboundLocalError under Python 3.4 in oneOf method, reported on Sourceforge by aldanor, thanks! - Fixed bug in ParseResults __init__ method, when returning non-ParseResults types from parse actions that implement __eq__. Raised during discussion on the pyparsing wiki with cyrfer. Version 2.0.2 - April, 2014 --------------------------- - Extended "expr(name)" shortcut (same as "expr.setResultsName(name)") to accept "expr()" as a shortcut for "expr.copy()". - Added "locatedExpr(expr)" helper, to decorate any returned tokens with their location within the input string. Adds the results names locn_start and locn_end to the output parse results. - Added "pprint()" method to ParseResults, to simplify troubleshooting and prettified output. Now instead of importing the pprint module and then writing "pprint.pprint(result)", you can just write "result.pprint()". This method also accepts addtional positional and keyword arguments (such as indent, width, etc.), which get passed through directly to the pprint method (see https://docs.python.org/2/library/pprint.html#pprint.pprint). - Removed deprecation warnings when using '<<' for Forward expression assignment. '<<=' is still preferred, but '<<' will be retained for cases where '<<=' operator is not suitable (such as in defining lambda expressions). - Expanded argument compatibility for classes and functions that take list arguments, to now accept generators as well. - Extended list-like behavior of ParseResults, adding support for append and extend. NOTE: if you have existing applications using these names as results names, you will have to access them using dict-style syntax: res["append"] and res["extend"] - ParseResults emulates the change in list vs. iterator semantics for methods like keys(), values(), and items(). Under Python 2.x, these methods will return lists, under Python 3.x, these methods will return iterators. - ParseResults now has a method haskeys() which returns True or False depending on whether any results names have been defined. This simplifies testing for the existence of results names under Python 3.x, which returns keys() as an iterator, not a list. - ParseResults now supports both list and dict semantics for pop(). If passed no argument or an integer argument, it will use list semantics and pop tokens from the list of parsed tokens. If passed a non-integer argument (most likely a string), it will use dict semantics and pop the corresponding value from any defined results names. A second default return value argument is supported, just as in dict.pop(). - Fixed bug in markInputline, thanks for reporting this, Matt Grant! - Cleaned up my unit test environment, now runs with Python 2.6 and 3.3. Version 2.0.1 - July, 2013 -------------------------- - Removed use of "nonlocal" that prevented using this version of pyparsing with Python 2.6 and 2.7. This will make it easier to install for packages that depend on pyparsing, under Python versions 2.6 and later. Those using older versions of Python will have to manually install pyparsing 1.5.7. - Fixed implementation of <<= operator to return self; reported by Luc J. Bourhis, with patch fix by Mathias Mamsch - thanks, Luc and Mathias! Version 2.0.0 - November, 2012 ------------------------------ - Rather than release another combined Python 2.x/3.x release I've decided to start a new major version that is only compatible with Python 3.x (and consequently Python 2.7 as well due to backporting of key features). This version will be the main development path from now on, with little follow-on development on the 1.5.x path. - Operator '<<' is now deprecated, in favor of operator '<<=' for attaching parsing expressions to Forward() expressions. This is being done to address precedence of operations problems with '<<'. Operator '<<' will be removed in a future version of pyparsing. Version 1.5.7 - November, 2012 ----------------------------- - NOTE: This is the last release of pyparsing that will try to maintain compatibility with Python versions < 2.6. The next release of pyparsing will be version 2.0.0, using new Python syntax that will not be compatible for Python version 2.5 or older. - An awesome new example is included in this release, submitted by Luca DellOlio, for parsing ANTLR grammar definitions, nice work Luca! - Fixed implementation of ParseResults.__str__ to use Pythonic ''.join() instead of repeated string concatenation. This purportedly has been a performance issue under PyPy. - Fixed bug in ParseResults.__dir__ under Python 3, reported by Thomas Kluyver, thank you Thomas! - Added ParserElement.inlineLiteralsUsing static method, to override pyparsing's default behavior of converting string literals to Literal instances, to use other classes (such as Suppress or CaselessLiteral). - Added new operator '<<=', which will eventually replace '<<' for storing the contents of a Forward(). '<<=' does not have the same operator precedence problems that '<<' does. - 'operatorPrecedence' is being renamed 'infixNotation' as a better description of what this helper function creates. 'operatorPrecedence' is deprecated, and will be dropped entirely in a future release. - Added optional arguments lpar and rpar to operatorPrecedence, so that expressions that use it can override the default suppression of the grouping characters. - Added support for using single argument builtin functions as parse actions. Now you can write 'expr.setParseAction(len)' and get back the length of the list of matched tokens. Supported builtins are: sum, len, sorted, reversed, list, tuple, set, any, all, min, and max. A script demonstrating this feature is included in the examples directory. - Improved linking in generated docs, proposed on the pyparsing wiki by techtonik, thanks! - Fixed a bug in the definition of 'alphas', which was based on the string.uppercase and string.lowercase "constants", which in fact *aren't* constant, but vary with locale settings. This could make parsers locale-sensitive in a subtle way. Thanks to Kef Schecter for his diligence in following through on reporting and monitoring this bugfix! - Fixed a bug in the Py3 version of pyparsing, during exception handling with packrat parsing enabled, reported by Catherine Devlin - thanks Catherine! - Fixed typo in ParseBaseException.__dir__, reported anonymously on the SourceForge bug tracker, thank you Pyparsing User With No Name. - Fixed bug in srange when using '\x###' hex character codes. - Addeed optional 'intExpr' argument to countedArray, so that you can define your own expression that will evaluate to an integer, to be used as the count for the following elements. Allows you to define a countedArray with the count given in hex, for example, by defining intExpr as "Word(hexnums).setParseAction(int(t[0],16))". Version 1.5.6 - June, 2011 ---------------------------- - Cleanup of parse action normalizing code, to be more version-tolerant, and robust in the face of future Python versions - much thanks to Raymond Hettinger for this rewrite! - Removal of exception cacheing, addressing a memory leak condition in Python 3. Thanks to Michael Droettboom and the Cape Town PUG for their analysis and work on this problem! - Fixed bug when using packrat parsing, where a previously parsed expression would duplicate subsequent tokens - reported by Frankie Ribery on stackoverflow, thanks! - Added 'ungroup' helper method, to address token grouping done implicitly by And expressions, even if only one expression in the And actually returns any text - also inspired by stackoverflow discussion with Frankie Ribery! - Fixed bug in srange, which accepted escaped hex characters of the form '\0x##', but should be '\x##'. Both forms will be supported for backwards compatibility. - Enhancement to countedArray, accepting an optional expression to be used for matching the leading integer count - proposed by Mathias on the pyparsing mailing list, good idea! - Added the Verilog parser to the provided set of examples, under the MIT license. While this frees up this parser for any use, if you find yourself using it in a commercial purpose, please consider making a charitable donation as described in the parser's header. - Added the excludeChars argument to the Word class, to simplify defining a word composed of all characters in a large range except for one or two. Suggested by JesterEE on the pyparsing wiki. - Added optional overlap parameter to scanString, to return overlapping matches found in the source text. - Updated oneOf internal regular expression generation, with improved parse time performance. - Slight performance improvement in transformString, removing empty strings from the list of string fragments built while scanning the source text, before calling ''.join. Especially useful when using transformString to strip out selected text. - Enhanced form of using the "expr('name')" style of results naming, in lieu of calling setResultsName. If name ends with an '*', then this is equivalent to expr.setResultsName('name',listAllMatches=True). - Fixed up internal list flattener to use iteration instead of recursion, to avoid stack overflow when transforming large files. - Added other new examples: . protobuf parser - parses Google's protobuf language . btpyparse - a BibTex parser contributed by Matthew Brett, with test suite test_bibparse.py (thanks, Matthew!) . groupUsingListAllMatches.py - demo using trailing '*' for results names Version 1.5.5 - August, 2010 ---------------------------- - Typo in Python3 version of pyparsing, "builtin" should be "builtins". (sigh) Version 1.5.4 - August, 2010 ---------------------------- - Fixed __builtins__ and file references in Python 3 code, thanks to Greg Watson, saulspatz, sminos, and Mark Summerfield for reporting their Python 3 experiences. - Added new example, apicheck.py, as a sample of scanning a Tcl-like language for functions with incorrect number of arguments (difficult to track down in Tcl languages). This example uses some interesting methods for capturing exceptions while scanning through source code. - Added new example deltaTime.py, that takes everyday time references like "an hour from now", "2 days ago", "next Sunday at 2pm". Version 1.5.3 - June, 2010 -------------------------- - ======= NOTE: API CHANGE!!!!!!! =============== With this release, and henceforward, the pyparsing module is imported as "pyparsing" on both Python 2.x and Python 3.x versions. - Fixed up setup.py to auto-detect Python version and install the correct version of pyparsing - suggested by Alex Martelli, thanks, Alex! (and my apologies to all those who struggled with those spurious installation errors caused by my earlier fumblings!) - Fixed bug on Python3 when using parseFile, getting bytes instead of a str from the input file. - Fixed subtle bug in originalTextFor, if followed by significant whitespace (like a newline) - discovered by Francis Vidal, thanks! - Fixed very sneaky bug in Each, in which Optional elements were not completely recognized as optional - found by Tal Weiss, thanks for your patience. - Fixed off-by-1 bug in line() method when the first line of the input text was an empty line. Thanks to John Krukoff for submitting a patch! - Fixed bug in transformString if grammar contains Group expressions, thanks to patch submitted by barnabas79, nice work! - Fixed bug in originalTextFor in which trailing comments or otherwised ignored text got slurped in with the matched expression. Thanks to michael_ramirez44 on the pyparsing wiki for reporting this just in time to get into this release! - Added better support for summing ParseResults, see the new example, parseResultsSumExample.py. - Added support for composing a Regex using a compiled RE object; thanks to my new colleague, Mike Thornton! - In version 1.5.2, I changed the way exceptions are raised in order to simplify the stacktraces reported during parsing. An anonymous user posted a bug report on SF that this behavior makes it difficult to debug some complex parsers, or parsers nested within parsers. In this release I've added a class attribute ParserElement.verbose_stacktrace, with a default value of False. If you set this to True, pyparsing will report stacktraces using the pre-1.5.2 behavior. - New examples: . pymicko.py, a MicroC compiler submitted by Zarko Zivanov. (Note: this example is separately licensed under the GPLv3, and requires Python 2.6 or higher.) Thank you, Zarko! . oc.py, a subset C parser, using the BNF from the 1996 Obfuscated C Contest. . stateMachine2.py, a modified version of stateMachine.py submitted by Matt Anderson, that is compatible with Python versions 2.7 and above - thanks so much, Matt! . select_parser.py, a parser for reading SQLite SELECT statements, as specified at https://www.sqlite.org/lang_select.html this goes into much more detail than the simple SQL parser included in pyparsing's source code . excelExpr.py, a *simplistic* first-cut at a parser for Excel expressions, which I originally posted on comp.lang.python in January, 2010; beware, this parser omits many common Excel cases (addition of numbers represented as strings, references to named ranges) . cpp_enum_parser.py, a nice little parser posted my Mark Tolonen on comp.lang.python in August, 2009 (redistributed here with Mark's permission). Thanks a bunch, Mark! . partial_gene_match.py, a sample I posted to Stackoverflow.com, implementing a special variation on Literal that does "close" matching, up to a given number of allowed mismatches. The application was to find matching gene sequences, with allowance for one or two mismatches. . tagCapture.py, a sample showing how to use a Forward placeholder to enforce matching of text parsed in a previous expression. . matchPreviousDemo.py, simple demo showing how the matchPreviousLiteral helper method is used to match a previously parsed token. Version 1.5.2 - April, 2009 ------------------------------ - Added pyparsing_py3.py module, so that Python 3 users can use pyparsing by changing their pyparsing import statement to: import pyparsing_py3 Thanks for help from Patrick Laban and his friend Geremy Condra on the pyparsing wiki. - Removed __slots__ declaration on ParseBaseException, for compatibility with IronPython 2.0.1. Raised by David Lawler on the pyparsing wiki, thanks David! - Fixed bug in SkipTo/failOn handling - caught by eagle eye cpennington on the pyparsing wiki! - Fixed second bug in SkipTo when using the ignore constructor argument, reported by Catherine Devlin, thanks! - Fixed obscure bug reported by Eike Welk when using a class as a ParseAction with an errant __getitem__ method. - Simplified exception stack traces when reporting parse exceptions back to caller of parseString or parseFile - thanks to a tip from Peter Otten on comp.lang.python. - Changed behavior of scanString to avoid infinitely looping on expressions that match zero-length strings. Prompted by a question posted by ellisonbg on the wiki. - Enhanced classes that take a list of expressions (And, Or, MatchFirst, and Each) to accept generator expressions also. This can be useful when generating lists of alternative expressions, as in this case, where the user wanted to match any repetitions of '+', '*', '#', or '.', but not mixtures of them (that is, match '+++', but not '+-+'): codes = "+*#." format = MatchFirst(Word(c) for c in codes) Based on a problem posed by Denis Spir on the Python tutor list. - Added new example eval_arith.py, which extends the example simpleArith.py to actually evaluate the parsed expressions. Version 1.5.1 - October, 2008 ------------------------------- - Added new helper method originalTextFor, to replace the use of the current keepOriginalText parse action. Now instead of using the parse action, as in: fullName = Word(alphas) + Word(alphas) fullName.setParseAction(keepOriginalText) (in this example, we used keepOriginalText to restore any white space that may have been skipped between the first and last names) You can now write: fullName = originalTextFor(Word(alphas) + Word(alphas)) The implementation of originalTextFor is simpler and faster than keepOriginalText, and does not depend on using the inspect or imp modules. - Added optional parseAll argument to parseFile, to be consistent with parseAll argument to parseString. Posted by pboucher on the pyparsing wiki, thanks! - Added failOn argument to SkipTo, so that grammars can define literal strings or pyparsing expressions which, if found in the skipped text, will cause SkipTo to fail. Useful to prevent SkipTo from reading past terminating expression. Instigated by question posed by Aki Niimura on the pyparsing wiki. - Fixed bug in nestedExpr if multi-character expressions are given for nesting delimiters. Patch provided by new pyparsing user, Hans-Martin Gaudecker - thanks, H-M! - Removed dependency on xml.sax.saxutils.escape, and included internal implementation instead - proposed by Mike Droettboom on the pyparsing mailing list, thanks Mike! Also fixed erroneous mapping in replaceHTMLEntity of " to ', now correctly maps to ". (Also added support for mapping ' to '.) - Fixed typo in ParseResults.insert, found by Alejandro Dubrovsky, good catch! - Added __dir__() methods to ParseBaseException and ParseResults, to support new dir() behavior in Py2.6 and Py3.0. If dir() is called on a ParseResults object, the returned list will include the base set of attribute names, plus any results names that are defined. - Fixed bug in ParseResults.asXML(), in which the first named item within a ParseResults gets reported with an tag instead of with the correct results name. - Fixed bug in '-' error stop, when '-' operator is used inside a Combine expression. - Reverted generator expression to use list comprehension, for better compatibility with old versions of Python. Reported by jester/artixdesign on the SourceForge pyparsing discussion list. - Fixed bug in parseString(parseAll=True), when the input string ends with a comment or whitespace. - Fixed bug in LineStart and LineEnd that did not recognize any special whitespace chars defined using ParserElement.setDefault- WhitespaceChars, found while debugging an issue for Marek Kubica, thanks for the new test case, Marek! - Made Forward class more tolerant of subclassing. Version 1.5.0 - June, 2008 -------------------------- This version of pyparsing includes work on two long-standing FAQ's: support for forcing parsing of the complete input string (without having to explicitly append StringEnd() to the grammar), and a method to improve the mechanism of detecting where syntax errors occur in an input string with various optional and alternative paths. This release also includes a helper method to simplify definition of indentation-based grammars. With these changes (and the past few minor updates), I thought it was finally time to bump the minor rev number on pyparsing - so 1.5.0 is now available! Read on... - AT LAST!!! You can now call parseString and have it raise an exception if the expression does not parse the entire input string. This has been an FAQ for a LONG time. The parseString method now includes an optional parseAll argument (default=False). If parseAll is set to True, then the given parse expression must parse the entire input string. (This is equivalent to adding StringEnd() to the end of the expression.) The default value is False to retain backward compatibility. Inspired by MANY requests over the years, most recently by ecir-hana on the pyparsing wiki! - Added new operator '-' for composing grammar sequences. '-' behaves just like '+' in creating And expressions, but '-' is used to mark grammar structures that should stop parsing immediately and report a syntax error, rather than just backtracking to the last successful parse and trying another alternative. For instance, running the following code: port_definition = Keyword("port") + '=' + Word(nums) entity_definition = Keyword("entity") + "{" + Optional(port_definition) + "}" entity_definition.parseString("entity { port 100 }") pyparsing fails to detect the missing '=' in the port definition. But, since this expression is optional, pyparsing then proceeds to try to match the closing '}' of the entity_definition. Not finding it, pyparsing reports that there was no '}' after the '{' character. Instead, we would like pyparsing to parse the 'port' keyword, and if not followed by an equals sign and an integer, to signal this as a syntax error. This can now be done simply by changing the port_definition to: port_definition = Keyword("port") - '=' + Word(nums) Now after successfully parsing 'port', pyparsing must also find an equals sign and an integer, or it will raise a fatal syntax exception. By judicious insertion of '-' operators, a pyparsing developer can have their grammar report much more informative syntax error messages. Patches and suggestions proposed by several contributors on the pyparsing mailing list and wiki - special thanks to Eike Welk and Thomas/Poldy on the pyparsing wiki! - Added indentedBlock helper method, to encapsulate the parse actions and indentation stack management needed to keep track of indentation levels. Use indentedBlock to define grammars for indentation-based grouping grammars, like Python's. indentedBlock takes up to 3 parameters: - blockStatementExpr - expression defining syntax of statement that is repeated within the indented block - indentStack - list created by caller to manage indentation stack (multiple indentedBlock expressions within a single grammar should share a common indentStack) - indent - boolean indicating whether block must be indented beyond the the current level; set to False for block of left-most statements (default=True) A valid block must contain at least one indented statement. - Fixed bug in nestedExpr in which ignored expressions needed to be set off with whitespace. Reported by Stefaan Himpe, nice catch! - Expanded multiplication of an expression by a tuple, to accept tuple values of None: . expr*(n,None) or expr*(n,) is equivalent to expr*n + ZeroOrMore(expr) (read as "at least n instances of expr") . expr*(None,n) is equivalent to expr*(0,n) (read as "0 to n instances of expr") . expr*(None,None) is equivalent to ZeroOrMore(expr) . expr*(1,None) is equivalent to OneOrMore(expr) Note that expr*(None,n) does not raise an exception if more than n exprs exist in the input stream; that is, expr*(None,n) does not enforce a maximum number of expr occurrences. If this behavior is desired, then write expr*(None,n) + ~expr - Added None as a possible operator for operatorPrecedence. None signifies "no operator", as in multiplying m times x in "y=mx+b". - Fixed bug in Each, reported by Michael Ramirez, in which the order of terms in the Each affected the parsing of the results. Problem was due to premature grouping of the expressions in the overall Each during grammar construction, before the complete Each was defined. Thanks, Michael! - Also fixed bug in Each in which Optional's with default values were not getting the defaults added to the results of the overall Each expression. - Fixed a bug in Optional in which results names were not assigned if a default value was supplied. - Cleaned up Py3K compatibility statements, including exception construction statements, and better equivalence between _ustr and basestring, and __nonzero__ and __bool__. Version 1.4.11 - February, 2008 ------------------------------- - With help from Robert A. Clark, this version of pyparsing is compatible with Python 3.0a3. Thanks for the help, Robert! - Added WordStart and WordEnd positional classes, to support expressions that must occur at the start or end of a word. Proposed by piranha on the pyparsing wiki, good idea! - Added matchOnlyAtCol helper parser action, to simplify parsing log or data files that have optional fields that are column dependent. Inspired by a discussion thread with hubritic on comp.lang.python. - Added withAttribute.ANY_VALUE as a match-all value when using withAttribute. Used to ensure that an attribute is present, without having to match on the actual attribute value. - Added get() method to ParseResults, similar to dict.get(). Suggested by new pyparsing user, Alejandro Dubrovksy, thanks! - Added '==' short-cut to see if a given string matches a pyparsing expression. For instance, you can now write: integer = Word(nums) if "123" == integer: # do something print [ x for x in "123 234 asld".split() if x==integer ] # prints ['123', '234'] - Simplified the use of nestedExpr when using an expression for the opening or closing delimiters. Now the content expression will not have to explicitly negate closing delimiters. Found while working with dfinnie on GHOP Task #277, thanks! - Fixed bug when defining ignorable expressions that are later enclosed in a wrapper expression (such as ZeroOrMore, OneOrMore, etc.) - found while working with Prabhu Gurumurthy, thanks Prahbu! - Fixed bug in withAttribute in which keys were automatically converted to lowercase, making it impossible to match XML attributes with uppercase characters in them. Using with- Attribute requires that you reference attributes in all lowercase if parsing HTML, and in correct case when parsing XML. - Changed '<<' operator on Forward to return None, since this is really used as a pseudo-assignment operator, not as a left-shift operator. By returning None, it is easier to catch faulty statements such as a << b | c, where precedence of operations causes the '|' operation to be performed *after* inserting b into a, so no alternation is actually implemented. The correct form is a << (b | c). With this change, an error will be reported instead of silently clipping the alternative term. (Note: this may break some existing code, but if it does, the code had a silent bug in it anyway.) Proposed by wcbarksdale on the pyparsing wiki, thanks! - Several unit tests were added to pyparsing's regression suite, courtesy of the Google Highly-Open Participation Contest. Thanks to all who administered and took part in this event! Version 1.4.10 - December 9, 2007 --------------------------------- - Fixed bug introduced in v1.4.8, parse actions were called for intermediate operator levels, not just the deepest matching operation level. Again, big thanks to Torsten Marek for helping isolate this problem! Version 1.4.9 - December 8, 2007 -------------------------------- - Added '*' multiplication operator support when creating grammars, accepting either an integer, or a two-integer tuple multiplier, as in: ipAddress = Word(nums) + ('.'+Word(nums))*3 usPhoneNumber = Word(nums) + ('-'+Word(nums))*(1,2) If multiplying by a tuple, the two integer values represent min and max multiples. Suggested by Vincent of eToy.com, great idea, Vincent! - Fixed bug in nestedExpr, original version was overly greedy! Thanks to Michael Ramirez for raising this issue. - Fixed internal bug in ParseResults - when an item was deleted, the key indices were not updated. Thanks to Tim Mitchell for posting a bugfix patch to the SF bug tracking system! - Fixed internal bug in operatorPrecedence - when the results of a right-associative term were sent to a parse action, the wrong tokens were sent. Reported by Torsten Marek, nice job! - Added pop() method to ParseResults. If pop is called with an integer or with no arguments, it will use list semantics and update the ParseResults' list of tokens. If pop is called with a non-integer (a string, for instance), then it will use dict semantics and update the ParseResults' internal dict. Suggested by Donn Ingle, thanks Donn! - Fixed quoted string built-ins to accept '\xHH' hex characters within the string. Version 1.4.8 - October, 2007 ----------------------------- - Added new helper method nestedExpr to easily create expressions that parse lists of data in nested parentheses, braces, brackets, etc. - Added withAttribute parse action helper, to simplify creating filtering parse actions to attach to expressions returned by makeHTMLTags and makeXMLTags. Use withAttribute to qualify a starting tag with one or more required attribute values, to avoid false matches on common tags such as or
. - Added new examples nested.py and withAttribute.py to demonstrate the new features. - Added performance speedup to grammars using operatorPrecedence, instigated by Stefan Reichör - thanks for the feedback, Stefan! - Fixed bug/typo when deleting an element from a ParseResults by using the element's results name. - Fixed whitespace-skipping bug in wrapper classes (such as Group, Suppress, Combine, etc.) and when using setDebug(), reported by new pyparsing user dazzawazza on SourceForge, nice job! - Added restriction to prevent defining Word or CharsNotIn expressions with minimum length of 0 (should use Optional if this is desired), and enhanced docstrings to reflect this limitation. Issue was raised by Joey Tallieu, who submitted a patch with a slightly different solution. Thanks for taking the initiative, Joey, and please keep submitting your ideas! - Fixed bug in makeHTMLTags that did not detect HTML tag attributes with no '= value' portion (such as ""), reported by hamidh on the pyparsing wiki - thanks! - Fixed minor bug in makeHTMLTags and makeXMLTags, which did not accept whitespace in closing tags. Version 1.4.7 - July, 2007 -------------------------- - NEW NOTATION SHORTCUT: ParserElement now accepts results names using a notational shortcut, following the expression with the results name in parentheses. So this: stats = "AVE:" + realNum.setResultsName("average") + \ "MIN:" + realNum.setResultsName("min") + \ "MAX:" + realNum.setResultsName("max") can now be written as this: stats = "AVE:" + realNum("average") + \ "MIN:" + realNum("min") + \ "MAX:" + realNum("max") The intent behind this change is to make it simpler to define results names for significant fields within the expression, while keeping the grammar syntax clean and uncluttered. - Fixed bug when packrat parsing is enabled, with cached ParseResults being updated by subsequent parsing. Reported on the pyparsing wiki by Kambiz, thanks! - Fixed bug in operatorPrecedence for unary operators with left associativity, if multiple operators were given for the same term. - Fixed bug in example simpleBool.py, corrected precedence of "and" vs. "or" operations. - Fixed bug in Dict class, in which keys were converted to strings whether they needed to be or not. Have narrowed this logic to convert keys to strings only if the keys are ints (which would confuse __getitem__ behavior for list indexing vs. key lookup). - Added ParserElement method setBreak(), which will invoke the pdb module's set_trace() function when this expression is about to be parsed. - Fixed bug in StringEnd in which reading off the end of the input string raises an exception - should match. Resolved while answering a question for Shawn on the pyparsing wiki. Version 1.4.6 - April, 2007 --------------------------- - Simplified constructor for ParseFatalException, to support common exception construction idiom: raise ParseFatalException, "unexpected text: 'Spanish Inquisition'" - Added method getTokensEndLoc(), to be called from within a parse action, for those parse actions that need both the starting *and* ending location of the parsed tokens within the input text. - Enhanced behavior of keepOriginalText so that named parse fields are preserved, even though tokens are replaced with the original input text matched by the current expression. Also, cleaned up the stack traversal to be more robust. Suggested by Tim Arnold - thanks, Tim! - Fixed subtle bug in which countedArray (and similar dynamic expressions configured in parse actions) failed to match within Or, Each, FollowedBy, or NotAny. Reported by Ralf Vosseler, thanks for your patience, Ralf! - Fixed Unicode bug in upcaseTokens and downcaseTokens parse actions, scanString, and default debugging actions; reported (and patch submitted) by Nikolai Zamkovoi, spasibo! - Fixed bug when saving a tuple as a named result. The returned token list gave the proper tuple value, but accessing the result by name only gave the first element of the tuple. Reported by Poromenos, nice catch! - Fixed bug in makeHTMLTags/makeXMLTags, which failed to match tag attributes with namespaces. - Fixed bug in SkipTo when setting include=True, to have the skipped-to tokens correctly included in the returned data. Reported by gunars on the pyparsing wiki, thanks! - Fixed typobug in OnceOnly.reset method, omitted self argument. Submitted by eike welk, thanks for the lint-picking! - Added performance enhancement to Forward class, suggested by akkartik on the pyparsing Wiki discussion, nice work! - Added optional asKeyword to Word constructor, to indicate that the given word pattern should be matched only as a keyword, that is, it should only match if it is within word boundaries. - Added S-expression parser to examples directory. - Added macro substitution example to examples directory. - Added holaMundo.py example, excerpted from Marco Alfonso's blog - muchas gracias, Marco! - Modified internal cyclic references in ParseResults to use weakrefs; this should help reduce the memory footprint of large parsing programs, at some cost to performance (3-5%). Suggested by bca48150 on the pyparsing wiki, thanks! - Enhanced the documentation describing the vagaries and idiosyncracies of parsing strings with embedded tabs, and the impact on: . parse actions . scanString . col and line helper functions (Suggested by eike welk in response to some unexplained inconsistencies between parsed location and offsets in the input string.) - Cleaned up internal decorators to preserve function names, docstrings, etc. Version 1.4.5 - December, 2006 ------------------------------ - Removed debugging print statement from QuotedString class. Sorry for not stripping this out before the 1.4.4 release! - A significant performance improvement, the first one in a while! For my Verilog parser, this version of pyparsing is about double the speed - YMMV. - Added support for pickling of ParseResults objects. (Reported by Jeff Poole, thanks Jeff!) - Fixed minor bug in makeHTMLTags that did not recognize tag attributes with embedded '-' or '_' characters. Also, added support for passing expressions to makeHTMLTags and makeXMLTags, and used this feature to define the globals anyOpenTag and anyCloseTag. - Fixed error in alphas8bit, I had omitted the y-with-umlaut character. - Added punc8bit string to complement alphas8bit - it contains all the non-alphabetic, non-blank 8-bit characters. - Added commonHTMLEntity expression, to match common HTML "ampersand" codes, such as "<", ">", "&", " ", and """. This expression also defines a results name 'entity', which can be used to extract the entity field (that is, "lt", "gt", etc.). Also added built-in parse action replaceHTMLEntity, which can be attached to commonHTMLEntity to translate "<", ">", "&", " ", and """ to "<", ">", "&", " ", and "'". - Added example, htmlStripper.py, that strips HTML tags and scripts from HTML pages. It also translates common HTML entities to their respective characters. Version 1.4.4 - October, 2006 ------------------------------- - Fixed traceParseAction decorator to also trap and record exception returns from parse actions, and to handle parse actions with 0, 1, 2, or 3 arguments. - Enhanced parse action normalization to support using classes as parse actions; that is, the class constructor is called at parse time and the __init__ function is called with 0, 1, 2, or 3 arguments. If passing a class as a parse action, the __init__ method must use one of the valid parse action parameter list formats. (This technique is useful when using pyparsing to compile parsed text into a series of application objects - see the new example simpleBool.py.) - Fixed bug in ParseResults when setting an item using an integer index. (Reported by Christopher Lambacher, thanks!) - Fixed whitespace-skipping bug, patch submitted by Paolo Losi - grazie, Paolo! - Fixed bug when a Combine contained an embedded Forward expression, reported by cie on the pyparsing wiki - good catch! - Fixed listAllMatches bug, when a listAllMatches result was nested within another result. (Reported by don pasquale on comp.lang.python, well done!) - Fixed bug in ParseResults items() method, when returning an item marked as listAllMatches=True - Fixed bug in definition of cppStyleComment (and javaStyleComment) in which '//' line comments were not continued to the next line if the line ends with a '\'. (Reported by eagle-eyed Ralph Corderoy!) - Optimized re's for cppStyleComment and quotedString for better re performance - also provided by Ralph Corderoy, thanks! - Added new example, indentedGrammarExample.py, showing how to define a grammar using indentation to show grouping (as Python does for defining statement nesting). Instigated by an e-mail discussion with Andrew Dalke, thanks Andrew! - Added new helper operatorPrecedence (based on e-mail list discussion with Ralph Corderoy and Paolo Losi), to facilitate definition of grammars for expressions with unary and binary operators. For instance, this grammar defines a 6-function arithmetic expression grammar, with unary plus and minus, proper operator precedence,and right- and left-associativity: expr = operatorPrecedence( operand, [("!", 1, opAssoc.LEFT), ("^", 2, opAssoc.RIGHT), (oneOf("+ -"), 1, opAssoc.RIGHT), (oneOf("* /"), 2, opAssoc.LEFT), (oneOf("+ -"), 2, opAssoc.LEFT),] ) Also added example simpleArith.py and simpleBool.py to provide more detailed code samples using this new helper method. - Added new helpers matchPreviousLiteral and matchPreviousExpr, for creating adaptive parsing expressions that match the same content as was parsed in a previous parse expression. For instance: first = Word(nums) matchExpr = first + ":" + matchPreviousLiteral(first) will match "1:1", but not "1:2". Since this matches at the literal level, this will also match the leading "1:1" in "1:10". In contrast: first = Word(nums) matchExpr = first + ":" + matchPreviousExpr(first) will *not* match the leading "1:1" in "1:10"; the expressions are evaluated first, and then compared, so "1" is compared with "10". - Added keepOriginalText parse action. Sometimes pyparsing's whitespace-skipping leaves out too much whitespace. Adding this parse action will restore any internal whitespace for a parse expression. This is especially useful when defining expressions for scanString or transformString applications. - Added __add__ method for ParseResults class, to better support using Python sum built-in for summing ParseResults objects returned from scanString. - Added reset method for the new OnlyOnce class wrapper for parse actions (to allow a grammar to be used multiple times). - Added optional maxMatches argument to scanString and searchString, to short-circuit scanning after 'n' expression matches are found. Version 1.4.3 - July, 2006 ------------------------------ - Fixed implementation of multiple parse actions for an expression (added in 1.4.2). . setParseAction() reverts to its previous behavior, setting one (or more) actions for an expression, overwriting any action or actions previously defined . new method addParseAction() appends one or more parse actions to the list of parse actions attached to an expression Now it is harder to accidentally append parse actions to an expression, when what you wanted to do was overwrite whatever had been defined before. (Thanks, Jean-Paul Calderone!) - Simplified interface to parse actions that do not require all 3 parse action arguments. Very rarely do parse actions require more than just the parsed tokens, yet parse actions still require all 3 arguments including the string being parsed and the location within the string where the parse expression was matched. With this release, parse actions may now be defined to be called as: . fn(string,locn,tokens) (the current form) . fn(locn,tokens) . fn(tokens) . fn() The setParseAction and addParseAction methods will internally decorate the provided parse actions with compatible wrappers to conform to the full (string,locn,tokens) argument sequence. - REMOVED SUPPORT FOR RETURNING PARSE LOCATION FROM A PARSE ACTION. I announced this in March, 2004, and gave a final warning in the last release. Now you can return a tuple from a parse action, and it will be treated like any other return value (i.e., the tuple will be substituted for the incoming tokens passed to the parse action, which is useful when trying to parse strings into tuples). - Added setFailAction method, taking a callable function fn that takes the arguments fn(s,loc,expr,err) where: . s - string being parsed . loc - location where expression match was attempted and failed . expr - the parse expression that failed . err - the exception thrown The function returns no values. It may throw ParseFatalException if it is desired to stop parsing immediately. (Suggested by peter21081944 on wikispaces.com) - Added class OnlyOnce as helper wrapper for parse actions. OnlyOnce only permits a parse action to be called one time, after which all subsequent calls throw a ParseException. - Added traceParseAction decorator to help debug parse actions. Simply insert "@traceParseAction" ahead of the definition of your parse action, and each invocation will be displayed, along with incoming arguments, and returned value. - Fixed bug when copying ParserElements using copy() or setResultsName(). (Reported by Dan Thill, great catch!) - Fixed bug in asXML() where token text contains <, >, and & characters - generated XML now escapes these as <, > and &. (Reported by Jacek Sieka, thanks!) - Fixed bug in SkipTo() when searching for a StringEnd(). (Reported by Pete McEvoy, thanks Pete!) - Fixed "except Exception" statements, the most critical added as part of the packrat parsing enhancement. (Thanks, Erick Tryzelaar!) - Fixed end-of-string infinite looping on LineEnd and StringEnd expressions. (Thanks again to Erick Tryzelaar.) - Modified setWhitespaceChars to return self, to be consistent with other ParserElement modifiers. (Suggested by Erick Tryzelaar.) - Fixed bug/typo in new ParseResults.dump() method. - Fixed bug in searchString() method, in which only the first token of an expression was returned. searchString() now returns a ParseResults collection of all search matches. - Added example program removeLineBreaks.py, a string transformer that converts text files with hard line-breaks into one with line breaks only between paragraphs. - Added example program listAllMatches.py, to illustrate using the listAllMatches option when specifying results names (also shows new support for passing lists to oneOf). - Added example program linenoExample.py, to illustrate using the helper methods lineno, line, and col, and returning objects from a parse action. - Added example program parseListString.py, to which can parse the string representation of a Python list back into a true list. Taken mostly from my PyCon presentation examples, but now with support for tuple elements, too! Version 1.4.2 - April 1, 2006 (No foolin'!) ------------------------------------------- - Significant speedup from memoizing nested expressions (a technique known as "packrat parsing"), thanks to Chris Lesniewski-Laas! Your mileage may vary, but my Verilog parser almost doubled in speed to over 600 lines/sec! This speedup may break existing programs that use parse actions that have side-effects. For this reason, packrat parsing is disabled when you first import pyparsing. To activate the packrat feature, your program must call the class method ParserElement.enablePackrat(). If your program uses psyco to "compile as you go", you must call enablePackrat before calling psyco.full(). If you do not do this, Python will crash. For best results, call enablePackrat() immediately after importing pyparsing. - Added new helper method countedArray(expr), for defining patterns that start with a leading integer to indicate the number of array elements, followed by that many elements, matching the given expr parse expression. For instance, this two-liner: wordArray = countedArray(Word(alphas)) print wordArray.parseString("3 Practicality beats purity")[0] returns the parsed array of words: ['Practicality', 'beats', 'purity'] The leading token '3' is suppressed, although it is easily obtained from the length of the returned array. (Inspired by e-mail discussion with Ralf Vosseler.) - Added support for attaching multiple parse actions to a single ParserElement. (Suggested by Dan "Dang" Griffith - nice idea, Dan!) - Added support for asymmetric quoting characters in the recently-added QuotedString class. Now you can define your own quoted string syntax like "<>". To define this custom form of QuotedString, your code would define: dblAngleQuotedString = QuotedString('<<',endQuoteChar='>>') QuotedString also supports escaped quotes, escape character other than '\', and multiline. - Changed the default value returned internally by Optional, so that None can be used as a default value. (Suggested by Steven Bethard - I finally saw the light!) - Added dump() method to ParseResults, to make it easier to list out and diagnose values returned from calling parseString. - A new example, a search query string parser, submitted by Steven Mooij and Rudolph Froger - a very interesting application, thanks! - Added an example that parses the BNF in Python's Grammar file, in support of generating Python grammar documentation. (Suggested by J H Stovall.) - A new example, submitted by Tim Cera, of a flexible parser module, using a simple config variable to adjust parsing for input formats that have slight variations - thanks, Tim! - Added an example for parsing Roman numerals, showing the capability of parse actions to "compile" Roman numerals into their integer values during parsing. - Added a new docs directory, for additional documentation or help. Currently, this includes the text and examples from my recent presentation at PyCon. - Fixed another typo in CaselessKeyword, thanks Stefan Behnel. - Expanded oneOf to also accept tuples, not just lists. This really should be sufficient... - Added deprecation warnings when tuple is returned from a parse action. Looking back, I see that I originally deprecated this feature in March, 2004, so I'm guessing people really shouldn't have been using this feature - I'll drop it altogether in the next release, which will allow users to return a tuple from a parse action (which is really handy when trying to reconstuct tuples from a tuple string representation!). Version 1.4.1 - February, 2006 ------------------------------ - Converted generator expression in QuotedString class to list comprehension, to retain compatibility with Python 2.3. (Thanks, Titus Brown for the heads-up!) - Added searchString() method to ParserElement, as an alternative to using "scanString(instring).next()[0][0]" to search through a string looking for a substring matching a given parse expression. (Inspired by e-mail conversation with Dave Feustel.) - Modified oneOf to accept lists of strings as well as a single string of space-delimited literals. (Suggested by Jacek Sieka - thanks!) - Removed deprecated use of Upcase in pyparsing test code. (Also caught by Titus Brown.) - Removed lstrip() call from Literal - too aggressive in stripping whitespace which may be valid for some grammars. (Point raised by Jacek Sieka). Also, made Literal more robust in the event of passing an empty string. - Fixed bug in replaceWith when returning None. - Added cautionary documentation for Forward class when assigning a MatchFirst expression, as in: fwdExpr << a | b | c Precedence of operators causes this to be evaluated as: (fwdExpr << a) | b | c thereby leaving b and c out as parseable alternatives. Users must explicitly group the values inserted into the Forward: fwdExpr << (a | b | c) (Suggested by Scot Wilcoxon - thanks, Scot!) Version 1.4 - January 18, 2006 ------------------------------ - Added Regex class, to permit definition of complex embedded expressions using regular expressions. (Enhancement provided by John Beisley, great job!) - Converted implementations of Word, oneOf, quoted string, and comment helpers to utilize regular expression matching. Performance improvements in the 20-40% range. - Added QuotedString class, to support definition of non-standard quoted strings (Suggested by Guillaume Proulx, thanks!) - Added CaselessKeyword class, to streamline grammars with, well, caseless keywords (Proposed by Stefan Behnel, thanks!) - Fixed bug in SkipTo, when using an ignoreable expression. (Patch provided by Anonymous, thanks, whoever-you-are!) - Fixed typo in NoMatch class. (Good catch, Stefan Behnel!) - Fixed minor bug in _makeTags(), using string.printables instead of pyparsing.printables. - Cleaned up some of the expressions created by makeXXXTags helpers, to suppress extraneous <> characters. - Added some grammar definition-time checking to verify that a grammar is being built using proper ParserElements. - Added examples: . LAparser.py - linear algebra C preprocessor (submitted by Mike Ellis, thanks Mike!) . wordsToNum.py - converts word description of a number back to the original number (such as 'one hundred and twenty three' -> 123) . updated fourFn.py to support unary minus, added BNF comments Version 1.3.3 - September 12, 2005 ---------------------------------- - Improved support for Unicode strings that would be returned using srange. Added greetingInKorean.py example, for a Korean version of "Hello, World!" using Unicode. (Thanks, June Kim!) - Added 'hexnums' string constant (nums+"ABCDEFabcdef") for defining hexadecimal value expressions. - NOTE: ===THIS CHANGE MAY BREAK EXISTING CODE=== Modified tag and results definitions returned by makeHTMLTags(), to better support the looseness of HTML parsing. Tags to be parsed are now caseless, and keys generated for tag attributes are now converted to lower case. Formerly, makeXMLTags("XYZ") would return a tag with results name of "startXYZ", this has been changed to "startXyz". If this tag is matched against '', the matched keys formerly would be "Abc", "DEF", and "ghi"; keys are now converted to lower case, giving keys of "abc", "def", and "ghi". These changes were made to try to address the lax case sensitivity agreement between start and end tags in many HTML pages. No changes were made to makeXMLTags(), which assumes more rigorous parsing rules. Also, cleaned up case-sensitivity bugs in closing tags, and switched to using Keyword instead of Literal class for tags. (Thanks, Steve Young, for getting me to look at these in more detail!) - Added two helper parse actions, upcaseTokens and downcaseTokens, which will convert matched text to all uppercase or lowercase, respectively. - Deprecated Upcase class, to be replaced by upcaseTokens parse action. - Converted messages sent to stderr to use warnings module, such as when constructing a Literal with an empty string, one should use the Empty() class or the empty helper instead. - Added ' ' (space) as an escapable character within a quoted string. - Added helper expressions for common comment types, in addition to the existing cStyleComment (/*...*/) and htmlStyleComment () . dblSlashComment = // ... (to end of line) . cppStyleComment = cStyleComment or dblSlashComment . javaStyleComment = cppStyleComment . pythonStyleComment = # ... (to end of line) Version 1.3.2 - July 24, 2005 ----------------------------- - Added Each class as an enhanced version of And. 'Each' requires that all given expressions be present, but may occur in any order. Special handling is provided to group ZeroOrMore and OneOrMore elements that occur out-of-order in the input string. You can also construct 'Each' objects by joining expressions with the '&' operator. When using the Each class, results names are strongly recommended for accessing the matched tokens. (Suggested by Pradam Amini - thanks, Pradam!) - Stricter interpretation of 'max' qualifier on Word elements. If the 'max' attribute is specified, matching will fail if an input field contains more than 'max' consecutive body characters. For example, previously, Word(nums,max=3) would match the first three characters of '0123456', returning '012' and continuing parsing at '3'. Now, when constructed using the max attribute, Word will raise an exception with this string. - Cleaner handling of nested dictionaries returned by Dict. No longer necessary to dereference sub-dictionaries as element [0] of their parents. === NOTE: THIS CHANGE MAY BREAK SOME EXISTING CODE, BUT ONLY IF PARSING NESTED DICTIONARIES USING THE LITTLE-USED DICT CLASS === (Prompted by discussion thread on the Python Tutor list, with contributions from Danny Yoo, Kent Johnson, and original post by Liam Clarke - thanks all!) Version 1.3.1 - June, 2005 ---------------------------------- - Added markInputline() method to ParseException, to display the input text line location of the parsing exception. (Thanks, Stefan Behnel!) - Added setDefaultKeywordChars(), so that Keyword definitions using a custom keyword character set do not all need to add the keywordChars constructor argument (similar to setDefaultWhitespaceChars()). (suggested by rzhanka on the SourceForge pyparsing forum.) - Simplified passing debug actions to setDebugAction(). You can now pass 'None' for a debug action if you want to take the default debug behavior. To suppress a particular debug action, you can pass the pyparsing method nullDebugAction. - Refactored parse exception classes, moved all behavior to ParseBaseException, and the former ParseException is now a subclass of ParseBaseException. Added a second subclass, ParseFatalException, as a subclass of ParseBaseException. User-defined parse actions can raise ParseFatalException if a data inconsistency is detected (such as a begin-tag/end-tag mismatch), and this will stop all parsing immediately. (Inspired by e-mail thread with Michele Petrazzo - thanks, Michelle!) - Added helper methods makeXMLTags and makeHTMLTags, that simplify the definition of XML or HTML tag parse expressions for a given tagname. Both functions return a pair of parse expressions, one for the opening tag (that is, '') and one for the closing tag (''). The opening tagame also recognizes any attribute definitions that have been included in the opening tag, as well as an empty tag (one with a trailing '/', as in '' which is equivalent to ''). makeXMLTags uses stricter XML syntax for attributes, requiring that they be enclosed in double quote characters - makeHTMLTags is more lenient, and accepts single-quoted strings or any contiguous string of characters up to the next whitespace character or '>' character. Attributes can be retrieved as dictionary or attribute values of the returned results from the opening tag. - Added example minimath2.py, a refinement on fourFn.py that adds an interactive session and support for variables. (Thanks, Steven Siew!) - Added performance improvement, up to 20% reduction! (Found while working with Wolfgang Borgert on performance tuning of his TTCN3 parser.) - And another performance improvement, up to 25%, when using scanString! (Found while working with Henrik Westlund on his C header file scanner.) - Updated UML diagrams to reflect latest class/method changes. Version 1.3 - March, 2005 ---------------------------------- - Added new Keyword class, as a special form of Literal. Keywords must be followed by whitespace or other non-keyword characters, to distinguish them from variables or other identifiers that just happen to start with the same characters as a keyword. For instance, the input string containing "ifOnlyIfOnly" will match a Literal("if") at the beginning and in the middle, but will fail to match a Keyword("if"). Keyword("if") will match only strings such as "if only" or "if(only)". (Proposed by Wolfgang Borgert, and Berteun Damman separately requested this on comp.lang.python - great idea!) - Added setWhitespaceChars() method to override the characters to be skipped as whitespace before matching a particular ParseElement. Also added the class-level method setDefaultWhitespaceChars(), to allow users to override the default set of whitespace characters (space, tab, newline, and return) for all subsequently defined ParseElements. (Inspired by Klaas Hofstra's inquiry on the Sourceforge pyparsing forum.) - Added helper parse actions to support some very common parse action use cases: . replaceWith(replStr) - replaces the matching tokens with the provided replStr replacement string; especially useful with transformString() . removeQuotes - removes first and last character from string enclosed in quotes (note - NOT the same as the string strip() method, as only a single character is removed at each end) - Added copy() method to ParseElement, to make it easier to define different parse actions for the same basic parse expression. (Note, copy is implicitly called when using setResultsName().) (The following changes were posted to CVS as Version 1.2.3 - October-December, 2004) - Added support for Unicode strings in creating grammar definitions. (Big thanks to Gavin Panella!) - Added constant alphas8bit to include the following 8-bit characters: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ - Added srange() function to simplify definition of Word elements, using regexp-like '[A-Za-z0-9]' syntax. This also simplifies referencing common 8-bit characters. - Fixed bug in Dict when a single element Dict was embedded within another Dict. (Thanks Andy Yates for catching this one!) - Added 'formatted' argument to ParseResults.asXML(). If set to False, suppresses insertion of whitespace for pretty-print formatting. Default equals True for backward compatibility. - Added setDebugActions() function to ParserElement, to allow user-defined debugging actions. - Added support for escaped quotes (either in \', \", or doubled quote form) to the predefined expressions for quoted strings. (Thanks, Ero Carrera!) - Minor performance improvement (~5%) converting "char in string" tests to "char in dict". (Suggested by Gavin Panella, cool idea!) Version 1.2.2 - September 27, 2004 ---------------------------------- - Modified delimitedList to accept an expression as the delimiter, instead of only accepting strings. - Modified ParseResults, to convert integer field keys to strings (to avoid confusion with list access). - Modified Combine, to convert all embedded tokens to strings before combining. - Fixed bug in MatchFirst in which parse actions would be called for expressions that only partially match. (Thanks, John Hunter!) - Fixed bug in fourFn.py example that fixes right-associativity of ^ operator. (Thanks, Andrea Griffini!) - Added class FollowedBy(expression), to look ahead in the input string without consuming tokens. - Added class NoMatch that never matches any input. Can be useful in debugging, and in very specialized grammars. - Added example pgn.py, for parsing chess game files stored in Portable Game Notation. (Thanks, Alberto Santini!) Version 1.2.1 - August 19, 2004 ------------------------------- - Added SkipTo(expression) token type, simplifying grammars that only want to specify delimiting expressions, and want to match any characters between them. - Added helper method dictOf(key,value), making it easier to work with the Dict class. (Inspired by Pavel Volkovitskiy, thanks!). - Added optional argument listAllMatches (default=False) to setResultsName(). Setting listAllMatches to True overrides the default modal setting of tokens to results names; instead, the results name acts as an accumulator for all matching tokens within the local repetition group. (Suggested by Amaury Le Leyzour - thanks!) - Fixed bug in ParseResults, throwing exception when trying to extract slice, or make a copy using [:]. (Thanks, Wilson Fowlie!) - Fixed bug in transformString() when the input string contains 's (Thanks, Rick Walia!). - Fixed bug in returning tokens from un-Grouped And's, Or's and MatchFirst's, where too many tokens would be included in the results, confounding parse actions and returned results. - Fixed bug in naming ParseResults returned by And's, Or's, and Match First's. - Fixed bug in LineEnd() - matching this token now correctly consumes and returns the end of line "\n". - Added a beautiful example for parsing Mozilla calendar files (Thanks, Petri Savolainen!). - Added support for dynamically modifying Forward expressions during parsing. Version 1.2 - 20 June 2004 -------------------------- - Added definition for htmlComment to help support HTML scanning and parsing. - Fixed bug in generating XML for Dict classes, in which trailing item was duplicated in the output XML. - Fixed release bug in which scanExamples.py was omitted from release files. - Fixed bug in transformString() when parse actions are not defined on the outermost parser element. - Added example urlExtractor.py, as another example of using scanString and parse actions. Version 1.2beta3 - 4 June 2004 ------------------------------ - Added White() token type, analogous to Word, to match on whitespace characters. Use White in parsers with significant whitespace (such as configuration file parsers that use indentation to indicate grouping). Construct White with a string containing the whitespace characters to be matched. Similar to Word, White also takes optional min, max, and exact parameters. - As part of supporting whitespace-signficant parsing, added parseWithTabs() method to ParserElement, to override the default behavior in parseString of automatically expanding tabs to spaces. To retain tabs during parsing, call parseWithTabs() before calling parseString(), parseFile() or scanString(). (Thanks, Jean-Guillaume Paradis for catching this, and for your suggestions on whitespace-significant parsing.) - Added transformString() method to ParseElement, as a complement to scanString(). To use transformString, define a grammar and attach a parse action to the overall grammar that modifies the returned token list. Invoking transformString() on a target string will then scan for matches, and replace the matched text patterns according to the logic in the parse action. transformString() returns the resulting transformed string. (Note: transformString() does *not* automatically expand tabs to spaces.) Also added scanExamples.py to the examples directory to show sample uses of scanString() and transformString(). - Removed group() method that was introduced in beta2. This turns out NOT to be equivalent to nesting within a Group() object, and I'd prefer not to sow more seeds of confusion. - Fixed behavior of asXML() where tags for groups were incorrectly duplicated. (Thanks, Brad Clements!) - Changed beta version message to display to stderr instead of stdout, to make asXML() easier to use. (Thanks again, Brad.) Version 1.2beta2 - 19 May 2004 ------------------------------ - *** SIMPLIFIED API *** - Parse actions that do not modify the list of tokens no longer need to return a value. This simplifies those parse actions that use the list of tokens to update a counter or record or display some of the token content; these parse actions can simply end without having to specify 'return toks'. - *** POSSIBLE API INCOMPATIBILITY *** - Fixed CaselessLiteral bug, where the returned token text was not the original string (as stated in the docs), but the original string converted to upper case. (Thanks, Dang Griffith!) **NOTE: this may break some code that relied on this erroneous behavior. Users should scan their code for uses of CaselessLiteral.** - *** POSSIBLE CODE INCOMPATIBILITY *** - I have renamed the internal attributes on ParseResults from 'dict' and 'list' to '__tokdict' and '__toklist', to avoid collisions with user-defined data fields named 'dict' and 'list'. Any client code that accesses these attributes directly will need to be modified. Hopefully the implementation of methods such as keys(), items(), len(), etc. on ParseResults will make such direct attribute accessess unnecessary. - Added asXML() method to ParseResults. This greatly simplifies the process of parsing an input data file and generating XML-structured data. - Added getName() method to ParseResults. This method is helpful when a grammar specifies ZeroOrMore or OneOrMore of a MatchFirst or Or expression, and the parsing code needs to know which expression matched. (Thanks, Eric van der Vlist, for this idea!) - Added items() and values() methods to ParseResults, to better support using ParseResults as a Dictionary. - Added parseFile() as a convenience function to parse the contents of an entire text file. Accepts either a file name or a file object. (Thanks again, Dang!) - Added group() method to And, Or, and MatchFirst, as a short-cut alternative to enclosing a construct inside a Group object. - Extended fourFn.py to support exponentiation, and simple built-in functions. - Added EBNF parser to examples, including a demo where it parses its own EBNF! (Thanks to Seo Sanghyeon!) - Added Delphi Form parser to examples, dfmparse.py, plus a couple of sample Delphi forms as tests. (Well done, Dang!) - Another performance speedup, 5-10%, inspired by Dang! Plus about a 20% speedup, by pre-constructing and cacheing exception objects instead of constructing them on the fly. - Fixed minor bug when specifying oneOf() with 'caseless=True'. - Cleaned up and added a few more docstrings, to improve the generated docs. Version 1.1.2 - 21 Mar 2004 --------------------------- - Fixed minor bug in scanString(), so that start location is at the start of the matched tokens, not at the start of the whitespace before the matched tokens. - Inclusion of HTML documentation, generated using Epydoc. Reformatted some doc strings to better generate readable docs. (Beautiful work, Ed Loper, thanks for Epydoc!) - Minor performance speedup, 5-15% - And on a process note, I've used the unittest module to define a series of unit tests, to help avoid the embarrassment of the version 1.1 snafu. Version 1.1.1 - 6 Mar 2004 -------------------------- - Fixed critical bug introduced in 1.1, which broke MatchFirst(!) token matching. **THANK YOU, SEO SANGHYEON!!!** - Added "from future import __generators__" to permit running under pre-Python 2.3. - Added example getNTPservers.py, showing how to use pyparsing to extract a text pattern from the HTML of a web page. Version 1.1 - 3 Mar 2004 ------------------------- - ***Changed API*** - While testing out parse actions, I found that the value of loc passed in was not the starting location of the matched tokens, but the location of the next token in the list. With this version, the location passed to the parse action is now the starting location of the tokens that matched. A second part of this change is that the return value of parse actions no longer needs to return a tuple containing both the location and the parsed tokens (which may optionally be modified); parse actions only need to return the list of tokens. Parse actions that return a tuple are deprecated; they will still work properly for conversion/compatibility, but this behavior will be removed in a future version. - Added validate() method, to help diagnose infinite recursion in a grammar tree. validate() is not 100% fool-proof, but it can help track down nasty infinite looping due to recursively referencing the same grammar construct without some intervening characters. - Cleaned up default listing of some parse element types, to more closely match ordinary BNF. Instead of the form :[contents-list], some changes are: . And(token1,token2,token3) is "{ token1 token2 token3 }" . Or(token1,token2,token3) is "{ token1 ^ token2 ^ token3 }" . MatchFirst(token1,token2,token3) is "{ token1 | token2 | token3 }" . Optional(token) is "[ token ]" . OneOrMore(token) is "{ token }..." . ZeroOrMore(token) is "[ token ]..." - Fixed an infinite loop in oneOf if the input string contains a duplicated option. (Thanks Brad Clements) - Fixed a bug when specifying a results name on an Optional token. (Thanks again, Brad Clements) - Fixed a bug introduced in 1.0.6 when I converted quotedString to use CharsNotIn; I accidentally permitted quoted strings to span newlines. I have fixed this in this version to go back to the original behavior, in which quoted strings do *not* span newlines. - Fixed minor bug in HTTP server log parser. (Thanks Jim Richardson) Version 1.0.6 - 13 Feb 2004 ---------------------------- - Added CharsNotIn class (Thanks, Lee SangYeong). This is the opposite of Word, in that it is constructed with a set of characters *not* to be matched. (This enhancement also allowed me to clean up and simplify some of the definitions for quoted strings, cStyleComment, and restOfLine.) - **MINOR API CHANGE** - Added joinString argument to the __init__ method of Combine (Thanks, Thomas Kalka). joinString defaults to "", but some applications might choose some other string to use instead, such as a blank or newline. joinString was inserted as the second argument to __init__, so if you have code that specifies an adjacent value, without using 'adjacent=', this code will break. - Modified LineStart to recognize the start of an empty line. - Added optional caseless flag to oneOf(), to create a list of CaselessLiteral tokens instead of Literal tokens. - Added some enhancements to the SQL example: . Oracle-style comments (Thanks to Harald Armin Massa) . simple WHERE clause - Minor performance speedup - 5-15% Version 1.0.5 - 19 Jan 2004 ---------------------------- - Added scanString() generator method to ParseElement, to support regex-like pattern-searching - Added items() list to ParseResults, to return named results as a list of (key,value) pairs - Fixed memory overflow in asList() for deeply nested ParseResults (Thanks, Sverrir Valgeirsson) - Minor performance speedup - 10-15% Version 1.0.4 - 8 Jan 2004 --------------------------- - Added positional tokens StringStart, StringEnd, LineStart, and LineEnd - Added commaSeparatedList to pre-defined global token definitions; also added commasep.py to the examples directory, to demonstrate the differences between parsing comma-separated data and simple line-splitting at commas - Minor API change: delimitedList does not automatically enclose the list elements in a Group, but makes this the responsibility of the caller; also, if invoked using 'combine=True', the list delimiters are also included in the returned text (good for scoped variables, such as a.b.c or a::b::c, or for directory paths such as a/b/c) - Performance speed-up again, 30-40% - Added httpServerLogParser.py to examples directory, as this is a common parsing task Version 1.0.3 - 23 Dec 2003 --------------------------- - Performance speed-up again, 20-40% - Added Python distutils installation setup.py, etc. (thanks, Dave Kuhlman) Version 1.0.2 - 18 Dec 2003 --------------------------- - **NOTE: Changed API again!!!** (for the last time, I hope) + Renamed module from parsing to pyparsing, to better reflect Python linkage. - Also added dictExample.py to examples directory, to illustrate usage of the Dict class. Version 1.0.1 - 17 Dec 2003 --------------------------- - **NOTE: Changed API!** + Renamed 'len' argument on Word.__init__() to 'exact' - Performance speed-up, 10-30% Version 1.0.0 - 15 Dec 2003 --------------------------- - Initial public release Version 0.1.1 thru 0.1.17 - October-November, 2003 -------------------------------------------------- - initial development iterations: - added Dict, Group - added helper methods oneOf, delimitedList - added helpers quotedString (and double and single), restOfLine, cStyleComment - added MatchFirst as an alternative to the slower Or - added UML class diagram - fixed various logic bugs