ó y]Rc@sðdZddlZddlZddlZddlmZmZmZmZddlm Z m Z ddlm Z ddlm Z d„Z dfd „ƒYZd fd „ƒYZd efd „ƒYZdefd„ƒYZedkrìndS(s( Module for downloading files from a pool of mirrors DESCRIPTION This module provides support for downloading files from a pool of mirrors with configurable failover policies. To a large extent, the failover policy is chosen by using different classes derived from the main class, MirrorGroup. Instances of MirrorGroup (and cousins) act very much like URLGrabber instances in that they have urlread, urlgrab, and urlopen methods. They can therefore, be used in very similar ways. from urlgrabber.grabber import URLGrabber from urlgrabber.mirror import MirrorGroup gr = URLGrabber() mg = MirrorGroup(gr, ['http://foo.com/some/directory/', 'http://bar.org/maybe/somewhere/else/', 'ftp://baz.net/some/other/place/entirely/'] mg.urlgrab('relative/path.zip') The assumption is that all mirrors are identical AFTER the base urls specified, so that any mirror can be used to fetch any file. FAILOVER The failover mechanism is designed to be customized by subclassing from MirrorGroup to change the details of the behavior. In general, the classes maintain a master mirror list and a "current mirror" index. When a download is initiated, a copy of this list and index is created for that download only. The specific failover policy depends on the class used, and so is documented in the class documentation. Note that ANY behavior of the class can be overridden, so any failover policy at all is possible (although you may need to change the interface in extreme cases). CUSTOMIZATION Most customization of a MirrorGroup object is done at instantiation time (or via subclassing). There are four major types of customization: 1) Pass in a custom urlgrabber - The passed in urlgrabber will be used (by default... see #2) for the grabs, so options to it apply for the url-fetching 2) Custom mirror list - Mirror lists can simply be a list of stings mirrors (as shown in the example above) but each can also be a dict, allowing for more options. For example, the first mirror in the list above could also have been: {'mirror': 'http://foo.com/some/directory/', 'grabber': , 'kwargs': { }} All mirrors are converted to this format internally. If 'grabber' is omitted, the default grabber will be used. If kwargs are omitted, then (duh) they will not be used. kwarg 'max_connections' limits the number of concurrent connections to this mirror. When omitted or set to zero, the default limit (2) will be used. 3) Pass keyword arguments when instantiating the mirror group. See, for example, the failure_callback argument. 4) Finally, any kwargs passed in for the specific file (to the urlgrab method, for example) will be folded in. The options passed into the grabber's urlXXX methods will override any options specified in a custom mirror dict. iÿÿÿÿN(t URLGrabErrortCallbackObjecttDEBUGt_to_utf8(t _run_callbackt _do_raise(t exception2msg(t_THcCs|S(N((tst((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyt_gst GrabRequestcBseZdZRS(sThis is a dummy class used to hold information about the specific request. For example, a single file. By maintaining this information separately, we can accomplish two things: 1) make it a little easier to be threadsafe 2) have request-specific parameters (t__name__t __module__t__doc__(((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyR jst MirrorGroupcBseZdZd„ZddgZd„Zd„Zd„Zd„Zd„Z id „Z d „Z d „Z dd „Zd „Zdd„ZRS(s|Base Mirror class Instances of this class are built with a grabber object and a list of mirrors. Then all calls to urlXXX should be passed relative urls. The requested file will be searched for on the first mirror. If the grabber raises an exception (possibly after some retries) then that mirror will be removed from the list, and the next will be attempted. If all mirrors are exhausted, then an exception will be raised. MirrorGroup has the following failover policy: * downloads begin with the first mirror * by default (see default_action below) a failure (after retries) causes it to increment the local AND master indices. Also, the current mirror is removed from the local list (but NOT the master list - the mirror can potentially be used for other files) * if the local list is ever exhausted, a URLGrabError will be raised (errno=256, No more mirrors). The 'errors' attribute holds a list of (full_url, errmsg) tuples. This contains all URLs tried and the corresponding error messages. OPTIONS In addition to the required arguments "grabber" and "mirrors", MirrorGroup also takes the following optional arguments: default_action A dict that describes the actions to be taken upon failure (after retries). default_action can contain any of the following keys (shown here with their default values): default_action = {'increment': 1, 'increment_master': 1, 'remove': 1, 'remove_master': 0, 'fail': 0} In this context, 'increment' means "use the next mirror" and 'remove' means "never use this mirror again". The two 'master' values refer to the instance-level mirror list (used for all files), whereas the non-master values refer to the current download only. The 'fail' option will cause immediate failure by re-raising the exception and no further attempts to get the current download. As in the "No more mirrors" case, the 'errors' attribute is set in the exception object. This dict can be set at instantiation time, mg = MirrorGroup(grabber, mirrors, default_action={'fail':1}) at method-execution time (only applies to current fetch), filename = mg.urlgrab(url, default_action={'increment': 0}) or by returning an action dict from the failure_callback return {'fail':0} in increasing precedence. If all three of these were done, the net result would be: {'increment': 0, # set in method 'increment_master': 1, # class default 'remove': 1, # class default 'remove_master': 0, # class default 'fail': 0} # set at instantiation, reset # from callback failure_callback this is a callback that will be called when a mirror "fails", meaning the grabber raises some URLGrabError. If this is a tuple, it is interpreted to be of the form (cb, args, kwargs) where cb is the actual callable object (function, method, etc). Otherwise, it is assumed to be the callable object itself. The callback will be passed a grabber.CallbackObject instance along with args and kwargs (if present). The following attributes are defined within the instance: obj.exception = < exception that was raised > obj.mirror = < the mirror that was tried > obj.tries = < the number of mirror tries so far > obj.relative_url = < url relative to the mirror > obj.url = < full url that failed > # .url is just the combination of .mirror # and .relative_url The failure callback can return an action dict, as described above. Like default_action, the failure_callback can be set at instantiation time or when the urlXXX method is called. In the latter case, it applies only for that fetch. The callback can re-raise the exception quite easily. For example, this is a perfectly adequate callback function: def callback(obj): raise obj.exception WARNING: do not save the exception object (or the CallbackObject instance). As they contain stack frame references, they can lead to circular references. Notes: * The behavior can be customized by deriving and overriding the 'CONFIGURATION METHODS' * The 'grabber' instance is kept as a reference, not copied. Therefore, the grabber instance can be modified externally and changes will take effect immediately. cKso||_|j|ƒ|_d|_tjƒ|_d|_|j |ƒd„}|jj d|dt ƒdS(s‡Initialize the MirrorGroup object. REQUIRED ARGUMENTS grabber - URLGrabber instance mirrors - a list of mirrors OPTIONAL ARGUMENTS failure_callback - callback to be used when a mirror fails default_action - dict of failure actions See the module-level and class level documentation for more details. icSsHtj|dƒ\}}| o;|jdiƒjdtƒ}||fS(Ntmirrortkwargstprivate(RtestimatetgettFalse(tmtspeedtfailR((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyR s%tkeytreverseN( tgrabbert_parse_mirrorstmirrorst_nexttthreadt allocate_lockt_locktNonetdefault_actiont_process_kwargstsorttTrue(tselfRRRR((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyt__init__ðs     R"tfailure_callbackcCs(|jdƒ|_|jdƒ|_dS(NR(R"(RR(R"(R&R((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyR#scCsMg}x@|D]8}t|tƒr8it|ƒd6}n|j|ƒq W|S(NR(t isinstancet basestringRtappend(R&Rtparsed_mirrorsR((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyRs  cCs<|jjƒt|jƒ|_|j|_|jjƒdS(N(R tacquiretlistRRtrelease(R&tgr((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyt_load_gr&s  cCsA|js3tdtdƒƒ}|j|_|‚n|j|jS(NisNo more mirrors to try.(RRR terrorsR(R&R0te((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyt _get_mirror.s    cCs|jjdƒp|j}|rst|ƒtdƒkrK|\}}}n di}}||||Žpmi}ni}t|jpˆiƒ}|j|jjdiƒƒ|j|ƒ|}|j||ƒ|r|jddƒr|jt j ƒd_‚ndS(NR(R"Rii((( tkwRR(ttypetdictR"tupdatetincrement_mirrorR2tsystexc_info(R&R0tcb_objtcbtargsRtactionta((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyt_failure8s  cCs¿|j|j}|jjƒy|jj|ƒ}Wntk rFnwX|jddƒrf|j|=n3|j|kr™|jddƒr™|jd7_n|jt|jƒkr½d|_n|jjƒ|jddƒrì|j|j=n$|jddƒr|jd7_n|jt|jƒkr4d|_nt r»g|jD]}|d^qD}t j dd j |ƒ|jƒg|jD]}|d^qƒ}t j d d j |ƒ|jƒnd S( s¬Tell the mirror object increment the mirror index This increments the mirror index, which amounts to telling the mirror object to use a different mirror (for this and future downloads). This is a SEMI-public method. It will be called internally, and you may never need to call it. However, it is provided (and is made public) so that the calling program can increment the mirror choice for methods like urlopen. For example, with urlopen, there's no good way for the mirror group to know that an error occurs mid-download (it's already returned and given you the file object). remove --- can have several values 0 do not remove the mirror from the list 1 remove the mirror for this download only 2 remove the mirror permanently beware of remove=0 as it can lead to infinite loops t remove_masteritincrement_masteritremovet incrementRsGR mirrors: [%s] %it sMAIN mirrors: [%s] %iN( RRR R-tindext ValueErrorRtlenR/Rtinfotjoin(R&R0R?t badmirrortindRtgrmtselfm((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyR9Ts0   !     cCs6|jdƒs|jdƒr&||S|d|SdS(Nt/(tendswitht startswith(R&tbase_urltrel_url((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyt _join_urlŒscCsÅtƒ}||_||_t|ƒ|_|j|ƒg|_x0|jD]%}y ||=WqJtk rnqJXqJWd}xE|d7}|j |ƒ}|j |d|jƒ}|j dƒpÃ|j } | j j|j diƒ} t| |ƒ} trtjd||ƒny| d| |f|ŽSWq|tk r½} trQtjdƒn|jj|t| ƒfƒtƒ} | | _|d| _|j| _|| _|| _|j|| ƒq|Xq|dS( NiiRRRsMIRROR: trying %s -> %stoptssMIRROR: failed(R tfuncturlR7R5R1R2toptionstKeyErrorR4RURRRVtderivetgetattrRRJRR+RRt exceptionRt relative_urlttriesRA(R&RWRXR5R0tkR_t mirrorchoicetfullurlRRVtfunc_refR3tobj((s5/usr/lib/python2.7/site-packages/urlgrabber/mirror.pyt _mirror_try’sD              cKsÆt|ƒ}||d<|jdƒrK|gitƒf|d<||d[s   "  ÿ^