gc@s3dZddlZddlZyddlmZWn!ek rUddlmZnXddlmZddl m Z ddl m Z m Z m Z ddl mZmZyeWnek reZnXyeWnek reZnXyeWnek r eZnXyeWnek r5eefZnXdd d d d d dgZejdejejBZejdejZejdejZejdjZejdejejBZ ej!dZ"ej!ddie d6Z#d e$fdYZ%e%Z&e&j'Z'ejdejejdejgZ(dddddd gZ)ejd!ejejd"ejejd#gZ*d$gZ+e(e)e*e+d%Z,d&Z-d'Z.e,je._dddgZ/d(gZ0d)e/e0ed*d+Z1d,Z2d-Z3ejd.ejZ4d/Z5dS(0scA cleanup tool for HTML. Removes unwanted tags and content. See the `Cleaner` class for details. iN(turlsplit(tetree(tdefs(t fromstringttostringtXHTML_NAMESPACE(t xhtml_to_htmlt_transform_resultt clean_htmltcleantCleanertautolinkt autolink_htmlt word_breaktword_break_htmlsexpression\s*\(.*?\)s @\s*imports?\s*(?:javascript|jscript|livescript|vbscript|data|about|mocha):s\s+s\[if[\s\n\r]+.*?][\s\n\r]*>sdescendant-or-self::*[@style]sdescendant-or-self::a [normalize-space(@href) and substring(normalize-space(@href),1,1) != '#'] |descendant-or-self::x:a[normalize-space(@href) and substring(normalize-space(@href),1,1) != '#']t namespacestxcBsIeZdZeZeZeZeZeZ eZ eZ eZ eZ eZeZeZdZdZdZeZeZejZeZdZeddgZdZedddddd d gddddd dd dZd Z dZ!dZ"dZ#dZ$ddZ%dZ&e'j(de'j)j*Z+dZ,dZ-RS(s Instances cleans the document of each of the possible offending elements. The cleaning is controlled by attributes; you can override attributes in a subclass, or set them in the constructor. ``scripts``: Removes any ``