ó i­:Oc@s5dZddlZddlZddlZyddlZWnek rSdZnXddlZddl m Z ddl m Z e j ƒdZeeddƒdd ged d ƒƒZeejeeƒƒZejd ƒZed „Zddd„Zdd„Zd„Zdd„Zdd„ZdZdS(sÍ --------------------------------------------- Miscellaneous functions for manipulating text --------------------------------------------- Collection of text functions that don't fit in another category. i˙˙˙˙N(tsets(tControlCharErrorg333333ă?iii i ii s(?s)<[^>]*>|&#?\w+;cCsąt|tƒs'ttjdƒƒ‚nd}yt||dƒWntk rZd}nX| ržtrž| ržtj |ƒ}|dt krž|d}qžn|s­d}n|S(s#Try to guess the encoding of a byte :class:`str` :arg byte_string: byte :class:`str` to guess the encoding of :kwarg disable_chardet: If this is True, we never attempt to use :mod:`chardet` to guess the encoding. This is useful if you need to have reproducibility whether :mod:`chardet` is installed or not. Default: :data:`False`. :raises TypeError: if :attr:`byte_string` is not a byte :class:`str` type :returns: string containing a guess at the encoding of :attr:`byte_string`. This is appropriate to pass as the encoding argument when encoding and decoding unicode strings. We start by attempting to decode the byte :class:`str` as :term:`UTF-8`. If this succeeds we tell the world it's :term:`UTF-8` text. If it doesn't and :mod:`chardet` is installed on the system and :attr:`disable_chardet` is False this function will use it to try detecting the encoding of :attr:`byte_string`. If it is not installed or :mod:`chardet` cannot determine the encoding with a high enough confidence then we rather arbitrarily claim that it is ``latin-1``. Since ``latin-1`` will encode to every byte, decoding from ``latin-1`` to :class:`unicode` will not cause :exc:`UnicodeErrors` although the output might be mangled. s'byte_string must be a byte string (str)sutf-8tstrictt confidencetencodingslatin-1N( t isinstancetstrt TypeErrortktb_tunicodetUnicodeDecodeErrortNonetchardettdetectt_CHARDET_THRESHHOLD(t byte_stringtdisable_chardettinput_encodingtdetection_info((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pytguess_encoding;s   sutf-8treplacecCszy||k o||k SWntk r/nXt|tƒrT|j||ƒ}n|j||ƒ}||krvtStS(sÝCompare two stringsi, converting to byte :class:`str` if one is :class:`unicode` :arg str1: First string to compare :arg str2: Second string to compare :kwarg encoding: If we need to convert one string into a byte :class:`str` to compare, the encoding to use. Default is :term:`utf-8`. :kwarg errors: What to do if we encounter errors when encoding the string. See the :func:`kitchen.text.converters.to_bytes` documentation for possible values. The default is ``replace``. This function prevents :exc:`UnicodeError` (python-2.4 or less) and :exc:`UnicodeWarning` (python 2.5 and higher) when we compare a :class:`unicode` string to a byte :class:`str`. The errors normally arise because the conversion is done to :term:`ASCII`. This function lets you convert to :term:`utf-8` or another encoding instead. .. note:: When we need to convert one of the strings from :class:`unicode` in order to compare them we convert the :class:`unicode` string into a byte :class:`str`. That means that strings can compare differently if you use different encodings for each. Note that ``str1 == str2`` is faster than this function if you can accept the following limitations: * Limited to python-2.5+ (otherwise a :exc:`UnicodeDecodeError` may be thrown) * Will generate a :exc:`UnicodeWarning` if non-:term:`ASCII` byte :class:`str` is compared to :class:`unicode` string. (t UnicodeErrorRR tencodetTruetFalse(tstr1tstr2Rterrors((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pytstr_eqds!  cCst|tƒs'ttjdƒƒ‚n|dkrXtttdgt tƒƒƒ}n¤|dkr‰tttdgt tƒƒƒ}ns|dkrçd}t |ƒ}gt D]}||krŽ|^qŽrüt tjdƒƒ‚qünt tjdƒƒ‚|r|j|ƒ}n|S( s˙Look for and transform :term:`control characters` in a string :arg string: string to search for and transform :term:`control characters` within :kwarg strategy: XML does not allow :term:`ASCII` :term:`control characters`. When we encounter those we need to know what to do. Valid options are: :replace: (default) Replace the :term:`control characters` with ``"?"`` :ignore: Remove the characters altogether from the output :strict: Raise a :exc:`~kitchen.text.exceptions.ControlCharError` when we encounter a control character :raises TypeError: if :attr:`string` is not a unicode string. :raises ValueError: if the strategy is not one of replace, ignore, or strict. :raises kitchen.text.exceptions.ControlCharError: if the strategy is ``strict`` and a :term:`control character` is present in the :attr:`string` :returns: :class:`unicode` string with no :term:`control characters` in it. sDprocess_control_char must have a unicode type as the first argument.tignoreRu?Rs*ASCII control code present in string inputsXThe strategy argument to process_control_chars must be one of ignore, replace, or strictN(RR RRR tdicttzipt_CONTROL_CODESR tlent frozensett_CONTROL_CHARSRt ValueErrort translate(tstringtstrategyt control_tabletdatatc((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pytprocess_control_chars“s % %  %cCsCd„}t|tƒs0ttjdƒƒ‚ntjt||ƒS(s/Substitute unicode characters for HTML entities :arg string: :class:`unicode` string to substitute out html entities :raises TypeError: if something other than a :class:`unicode` string is given :rtype: :class:`unicode` string :returns: The plain text without html entities cSs |jdƒ}|d dkr#dS|d dkryE|d dkr`tt|dd !d ƒƒStt|dd !ƒƒSWqtk r‹qXn|d d krtjj|dd !jd ƒƒ}|r|d d kr ytt|dd !ƒƒSWqtk rqXqt|dƒSqn|S(NiiuR=RR:R,R(sbyte_string_valid_encodingsbyte_string_valid_xmlsguess_encodingshtml_entities_unescapesprocess_control_charssstr_eq(t__doc__R1t itertoolsR7R t ImportErrorR tkitchenRtkitchen.pycompat24Rtkitchen.text.exceptionsRtadd_builtin_setRR#trangeR!timapR/R$tcompileR9RRRR,R:R=R>t__all__(((s5/usr/lib/python2.7/site-packages/kitchen/text/misc.pyts0       , )/ 8 * (