HEX

File: //usr/local/lib/python3.7/site-packages/pip/_internal/index/__pycache__/collector.cpython-37.pyc
B

L��g�@�@s�dZddlZddlZddlZddlZddlZddlZddlZddl	Z
ddlZ
ddlm
Z
ddlmZddlmZmZmZmZmZmZmZmZmZmZmZddlmZddlmZddl m!Z!m"Z"dd	l#m$Z$dd
l%m&Z&ddl'm(Z(ddl)m*Z*dd
l+m,Z,ddl-m.Z.ddl/m0Z0ddl1m2Z2ddl3m4Z4m5Z5m6Z6e�rHddlm7Z7ne8Z7e�9e:�Z;ee<e<fZ=e<ee<d�dd�Z>Gdd�de?�Z@edd�dd�ZAGdd�de?�ZBe<e*dd�dd �ZCe<e*ed�d!d"�ZDe=ee<d#�d$d%�ZEGd&d'�d'�ZFGd(d)�d)e7�ZGeGeGd*�d+d,�ZHeHd-ee&d.�d/d0��ZIGd1d-�d-�ZJGd2d3�d3e
�ZKdCe&ee<e?feed4dd5�d6d7�ZLdDeeMeJd9�d:d;�ZNe&e*ed-d<�d=d>�ZOGd?d@�d@e�ZPGdAdB�dB�ZQdS)EzO
The main purpose of this module is to expose LinkCollector.collect_sources().
�N)�
HTMLParser)�Values)�
TYPE_CHECKING�Callable�Dict�Iterable�List�MutableMapping�
NamedTuple�Optional�Sequence�Tuple�Union)�requests)�Response)�
RetryError�SSLError)�NetworkConnectionError)�Link)�SearchScope)�
PipSession)�raise_for_status)�is_archive_file)�redact_auth_from_url)�vcs�)�CandidatesFromPage�
LinkSource�build_source)�Protocol)�url�returncCs6x0tjD]&}|���|�r|t|�dkr|SqWdS)zgLook for VCS schemes in the URL.

    Returns the matched VCS scheme, or None if there's no match.
    z+:N)r�schemes�lower�
startswith�len)r �scheme�r'�A/tmp/pip-unpacked-wheel-hv55ucu3/pip/_internal/index/collector.py�_match_vcs_scheme7sr)cs&eZdZeedd��fdd�Z�ZS)�_NotAPIContentN)�content_type�request_descr!cst��||�||_||_dS)N)�super�__init__r+r,)�selfr+r,)�	__class__r'r(r.Csz_NotAPIContent.__init__)�__name__�
__module__�__qualname__�strr.�
__classcell__r'r')r0r(r*Bsr*)�responser!cCs6|j�dd�}|��}|�d�r$dSt||jj��dS)z�
    Check the Content-Type header to ensure the response contains a Simple
    API Response.

    Raises `_NotAPIContent` if the content type is not a valid content-type.
    zContent-Type�Unknown)z	text/htmlz#application/vnd.pypi.simple.v1+htmlz#application/vnd.pypi.simple.v1+jsonN)�headers�getr#r$r*�request�method)r6r+�content_type_lr'r'r(�_ensure_api_headerIsr=c@seZdZdS)�_NotHTTPN)r1r2r3r'r'r'r(r>_sr>)r �sessionr!cCsFtj�|�\}}}}}|dkr$t��|j|dd�}t|�t|�dS)z�
    Send a HEAD request to the URL, and ensure the response contains a simple
    API Response.

    Raises `_NotHTTP` if the URL is not available for a HEAD request, or
    `_NotAPIContent` if the content type is not a valid content type.
    >�http�httpsT)�allow_redirectsN)�urllib�parse�urlsplitr>�headrr=)r r?r&�netloc�path�query�fragment�respr'r'r(�_ensure_api_responsecsrLcCsztt|�j�rt||d�t�dt|��|j|d�dddg�dd�d	�}t	|�t
|�t�d
t|�|j�dd��|S)
aYAccess an Simple API response with GET, and return the response.

    This consists of three parts:

    1. If the URL looks suspiciously like an archive, send a HEAD first to
       check the Content-Type is HTML or Simple API, to avoid downloading a
       large file. Raise `_NotHTTP` if the content type cannot be determined, or
       `_NotAPIContent` if it is not HTML or a Simple API.
    2. Actually perform the request. Raise HTTP exceptions on network failures.
    3. Check the Content-Type header to make sure we got a Simple API response,
       and raise `_NotAPIContent` otherwise.
    )r?zGetting page %sz, z#application/vnd.pypi.simple.v1+jsonz*application/vnd.pypi.simple.v1+html; q=0.1ztext/html; q=0.01z	max-age=0)�Acceptz
Cache-Control)r8zFetched page %s as %szContent-Typer7)rr�filenamerL�logger�debugrr9�joinrr=r8)r r?rKr'r'r(�_get_simple_responseus"
rR)r8r!cCs<|r8d|kr8tj��}|d|d<|�d�}|r8t|�SdS)z=Determine if we have any encoding information in our headers.zContent-Typezcontent-type�charsetN)�email�message�Message�	get_paramr4)r8�mrSr'r'r(�_get_encoding_from_headers�s

rYc@s:eZdZddd�dd�Zeed�dd�Zed	�d
d�ZdS)�CacheablePageContent�IndexContentN)�pager!cCs|js
t�||_dS)N)�cache_link_parsing�AssertionErrorr\)r/r\r'r'r(r.�s
zCacheablePageContent.__init__)�otherr!cCst|t|��o|jj|jjkS)N)�
isinstance�typer\r )r/r_r'r'r(�__eq__�szCacheablePageContent.__eq__)r!cCst|jj�S)N)�hashr\r )r/r'r'r(�__hash__�szCacheablePageContent.__hash__)	r1r2r3r.�object�boolrb�intrdr'r'r'r(rZ�srZc@s eZdZdeed�dd�ZdS)�
ParseLinksr[)r\r!cCsdS)Nr')r/r\r'r'r(�__call__�szParseLinks.__call__N)r1r2r3rrrir'r'r'r(rh�srh)�fnr!csLtjdd�tttd��fdd���t���dttd���fdd	��}|S)
z�
    Given a function that parses an Iterable[Link] from an IndexContent, cache the
    function's result (keyed by CacheablePageContent), unless the IndexContent
    `page` has `page.cache_link_parsing == False`.
    N)�maxsize)�cacheable_pager!cst�|j��S)N)�listr\)rl)rjr'r(�wrapper�sz*with_cached_index_content.<locals>.wrapperr[)r\r!cs|jr�t|��St�|��S)N)r]rZrm)r\)rjrnr'r(�wrapper_wrapper�sz2with_cached_index_content.<locals>.wrapper_wrapper)�	functools�	lru_cacherZrr�wraps)rjror')rjrnr(�with_cached_index_content�s

rsr[)r\r!c
cs�|j��}|�d�rXt�|j�}x2|�dg�D]"}t�||j	�}|dkrJq.|Vq.WdSt
|j	�}|jpjd}|�|j�
|��|j	}|jp�|}x.|jD]$}	tj|	||d�}|dkr�q�|Vq�WdS)z\
    Parse a Simple API's Index Content, and yield its anchor elements as Link objects.
    z#application/vnd.pypi.simple.v1+json�filesNzutf-8)�page_url�base_url)r+r#r$�json�loads�contentr9r�	from_jsonr �HTMLLinkParser�encoding�feed�decoderv�anchorsZfrom_element)
r\r<�data�file�link�parserr|r rv�anchorr'r'r(�parse_links�s&





r�c@s<eZdZdZd
eeeeeedd�dd�Zed�dd	�Z	dS)r[z5Represents one response (or page), along with its URLTN)ryr+r|r r]r!cCs"||_||_||_||_||_dS)am
        :param encoding: the encoding to decode the given content.
        :param url: the URL from which the HTML was downloaded.
        :param cache_link_parsing: whether links parsed from this page's url
                                   should be cached. PyPI index urls should
                                   have this set to False, for example.
        N)ryr+r|r r])r/ryr+r|r r]r'r'r(r.s
zIndexContent.__init__)r!cCs
t|j�S)N)rr )r/r'r'r(�__str__szIndexContent.__str__)T)
r1r2r3�__doc__�bytesr4rrfr.r�r'r'r'r(r[scsneZdZdZedd��fdd�Zeeeeeefdd�dd�Z	eeeeefeed	�d
d�Z
�ZS)r{zf
    HTMLParser that keeps the first base HREF and a list of all anchor
    elements' attributes.
    N)r r!cs$t�jdd�||_d|_g|_dS)NT)�convert_charrefs)r-r.r rvr)r/r )r0r'r(r.#szHTMLLinkParser.__init__)�tag�attrsr!cCsH|dkr,|jdkr,|�|�}|dk	rD||_n|dkrD|j�t|��dS)N�base�a)rv�get_hrefr�append�dict)r/r�r��hrefr'r'r(�handle_starttag*s
zHTMLLinkParser.handle_starttag)r�r!cCs"x|D]\}}|dkr|SqWdS)Nr�r')r/r��name�valuer'r'r(r�2szHTMLLinkParser.get_href)r1r2r3r�r4r.rr
rr�r�r5r'r')r0r(r{s"r{).N)r��reason�methr!cCs|dkrtj}|d||�dS)Nz%Could not fetch URL %s: %s - skipping)rOrP)r�r�r�r'r'r(�_handle_get_simple_fail9sr�T)r6r]r!cCs&t|j�}t|j|jd||j|d�S)NzContent-Type)r|r r])rYr8r[ryr )r6r]r|r'r'r(�_make_index_contentCs
r�)r�r?r!c

Cs�|j�dd�d}t|�}|r0t�d||�dStj�|�\}}}}}}|dkr�tj	�
tj�|��r�|�
d�sv|d7}tj�|d�}t�d|�yt||d	�}W�nDtk
r�t�d
|�Y�n2tk
r�}zt�d||j|j�Wdd}~XYn�tk
�r$}zt||�Wdd}~XYn�tk
�rP}zt||�Wdd}~XYn�tk
�r�}z$d}	|	t|�7}	t||	tjd
�Wdd}~XYndtjk
�r�}zt|d|���Wdd}~XYn0tjk
�r�t|d�YnXt||jd�SdS)N�#rrzICannot look at %s URL %s because it does not support lookup as web pages.r��/z
index.htmlz# file: URL is directory, getting %s)r?z`Skipping page %s because it looks like an archive, and cannot be checked by a HTTP HEAD request.z�Skipping page %s because the %s request got Content-Type: %s. The only supported Content-Types are application/vnd.pypi.simple.v1+json, application/vnd.pypi.simple.v1+html, and text/htmlz4There was a problem confirming the ssl certificate: )r�zconnection error: z	timed out)r]) r �splitr)rO�warningrCrD�urlparse�osrH�isdirr:�url2pathname�endswith�urljoinrPrRr>r*r,r+rr�rrr4�infor�ConnectionError�Timeoutr�r])
r�r?r �
vcs_schemer&�_rHrK�excr�r'r'r(�_get_index_contentPsP
  r�c@s.eZdZUeeeed<eeeed<dS)�CollectedSources�
find_links�
index_urlsN)r1r2r3rrr�__annotations__r'r'r'r(r��s
r�c@sxeZdZdZeedd�dd�Zedeee	dd�dd	��Z
eee
d
�dd��Zeeed
�dd�Ze
eed�dd�ZdS)�
LinkCollectorz�
    Responsible for collecting Link objects from all configured locations,
    making network requests as needed.

    The class's main method is its collect_sources() method.
    N)r?�search_scoper!cCs||_||_dS)N)r�r?)r/r?r�r'r'r(r.�szLinkCollector.__init__F)r?�options�suppress_no_indexr!cCsd|jg|j}|jr8|s8t�dd�dd�|D���g}|jp@g}tj|||jd�}t	||d�}|S)z�
        :param session: The Session to use to make requests.
        :param suppress_no_index: Whether to ignore the --no-index option
            when constructing the SearchScope object.
        zIgnoring indexes: %s�,css|]}t|�VqdS)N)r)�.0r r'r'r(�	<genexpr>�sz'LinkCollector.create.<locals>.<genexpr>)r�r��no_index)r?r�)
�	index_url�extra_index_urlsr�rOrPrQr�r�creater�)�clsr?r�r�r�r�r��link_collectorr'r'r(r��s


zLinkCollector.create)r!cCs|jjS)N)r�r�)r/r'r'r(r��szLinkCollector.find_links)�locationr!cCst||jd�S)z>
        Fetch an HTML page containing package links.
        )r?)r�r?)r/r�r'r'r(�fetch_response�szLinkCollector.fetch_response)�project_name�candidates_from_pager!cs�t����fdd��j���D����}t����fdd��jD����}t�tj	�r�dd�t
�||�D�}t|��d��d�g|}t�
d�|��tt|�t|�d	�S)
Nc	3s&|]}t|��jjdd�d�VqdS)F)r��page_validator�
expand_dirr]r�N)rr?�is_secure_origin)r��loc)r�r�r/r'r(r��sz0LinkCollector.collect_sources.<locals>.<genexpr>c	3s&|]}t|��jjdd�d�VqdS)T)r�r�r�r]r�N)rr?r�)r�r�)r�r�r/r'r(r��scSs*g|]"}|dk	r|jdk	rd|j���qS)Nz* )r�)r��sr'r'r(�
<listcomp>�sz1LinkCollector.collect_sources.<locals>.<listcomp>z' location(s) to search for versions of �:�
)r�r�)�collections�OrderedDictr��get_index_urls_locations�valuesr�rO�isEnabledFor�logging�DEBUG�	itertools�chainr%rPrQr�rm)r/r�r�Zindex_url_sourcesZfind_links_sources�linesr')r�r�r/r(�collect_sources�szLinkCollector.collect_sources)F)r1r2r3r�rrr.�classmethodrrfr��propertyrr4r�rrr[r�rr�r�r'r'r'r(r��sr�)N)T)Rr�r��
email.messagerTrpr�rwr�r��urllib.parserC�urllib.request�html.parserr�optparser�typingrrrrrr	r
rrr
r�pip._vendorrZpip._vendor.requestsrZpip._vendor.requests.exceptionsrr�pip._internal.exceptionsr�pip._internal.models.linkr�!pip._internal.models.search_scoper�pip._internal.network.sessionr�pip._internal.network.utilsr�pip._internal.utils.filetypesr�pip._internal.utils.miscr�pip._internal.vcsr�sourcesrrrrre�	getLoggerr1rOr4ZResponseHeadersr)�	Exceptionr*r=r>rLrRrYrZrhrsr�r[r{r�rfr�r�r�r�r'r'r'r(�<module>sd4
?=