HEX

File: //usr/local/lib/python3.10/site-packages/pip/_internal/index/__pycache__/collector.cpython-310.pyc
o

��ix@�	@s�dZddlZddlZddlZddlZddlZddlZddlZddl	Z
ddlZ
ddlm
Z
ddlmZddlmZmZmZmZmZmZmZmZmZmZmZddlmZddlmZddl m!Z!m"Z"dd	l#m$Z$dd
l%m&Z&ddl'm(Z(ddl)m*Z*dd
l+m,Z,ddl-m.Z.ddl/m0Z0ddl1m2Z2ddl3m4Z4m5Z5m6Z6er�ddlm7Z7ne8Z7e�9e:�Z;ee<e<fZ=de<dee<fdd�Z>Gdd�de?�Z@deddfdd�ZAGdd�de?�ZBde<de*ddfd d!�ZCde<de*defd"d#�ZDd$e=dee<fd%d&�ZEGd'd(�d(�ZFGd)d*�d*e7�ZGd+eGdeGfd,d-�ZHeHd.d/dee&fd0d1��ZIGd2d/�d/�ZJGd3d4�d4e
�ZK	dEd5e&d6ee<e?fd7eed8ddfd9d:�ZL	;dFded<eMdeJfd=d>�ZNd5e&de*ded/fd?d@�ZOGdAdB�dBe�ZPGdCdD�dD�ZQdS)GzO
The main purpose of this module is to expose LinkCollector.collect_sources().
�N)�
HTMLParser)�Values)�
TYPE_CHECKING�Callable�Dict�Iterable�List�MutableMapping�
NamedTuple�Optional�Sequence�Tuple�Union)�requests)�Response)�
RetryError�SSLError)�NetworkConnectionError)�Link)�SearchScope)�
PipSession)�raise_for_status)�is_archive_file��redact_auth_from_url)�vcs�)�CandidatesFromPage�
LinkSource�build_source)�Protocol�url�returncCs6tjD]}|���|�r|t|�dvr|SqdS)zgLook for VCS schemes in the URL.

    Returns the matched VCS scheme, or None if there's no match.
    z+:N)r�schemes�lower�
startswith�len)r!�scheme�r(�H/usr/local/lib/python3.10/site-packages/pip/_internal/index/collector.py�_match_vcs_scheme7s

�r*cs*eZdZdededdf�fdd�Z�ZS)�_NotAPIContent�content_type�request_descr"Ncst��||�||_||_dS�N)�super�__init__r,r-)�selfr,r-��	__class__r(r)r0Cs
z_NotAPIContent.__init__)�__name__�
__module__�__qualname__�strr0�
__classcell__r(r(r2r)r+Bs"r+�responsecCs2|j�dd�}|��}|�d�rdSt||jj��)z�
    Check the Content-Type header to ensure the response contains a Simple
    API Response.

    Raises `_NotAPIContent` if the content type is not a valid content-type.
    �Content-Type�Unknown)z	text/htmlz#application/vnd.pypi.simple.v1+html�#application/vnd.pypi.simple.v1+jsonN)�headers�getr$r%r+�request�method)r9r,�content_type_lr(r(r)�_ensure_api_headerIs�rBc@seZdZdS)�_NotHTTPN)r4r5r6r(r(r(r)rC_srC�sessioncCsFtj�|�\}}}}}|dvrt��|j|dd�}t|�t|�dS)z�
    Send a HEAD request to the URL, and ensure the response contains a simple
    API Response.

    Raises `_NotHTTP` if the URL is not available for a HEAD request, or
    `_NotAPIContent` if the content type is not a valid content type.
    >�https�httpT)�allow_redirectsN)�urllib�parse�urlsplitrC�headrrB)r!rDr'�netloc�path�query�fragment�respr(r(r)�_ensure_api_responsecsrQcCsxtt|�j�r
t||d�t�dt|��|j|d�gd��dd�d�}t	|�t
|�t�dt|�|j�d	d
��|S)aYAccess an Simple API response with GET, and return the response.

    This consists of three parts:

    1. If the URL looks suspiciously like an archive, send a HEAD first to
       check the Content-Type is HTML or Simple API, to avoid downloading a
       large file. Raise `_NotHTTP` if the content type cannot be determined, or
       `_NotAPIContent` if it is not HTML or a Simple API.
    2. Actually perform the request. Raise HTTP exceptions on network failures.
    3. Check the Content-Type header to make sure we got a Simple API response,
       and raise `_NotAPIContent` otherwise.
    �rDzGetting page %sz, )r<z*application/vnd.pypi.simple.v1+html; q=0.1ztext/html; q=0.01z	max-age=0)�Acceptz
Cache-Control)r=zFetched page %s as %sr:r;)rr�filenamerQ�logger�debugrr>�joinrrBr=)r!rDrPr(r(r)�_get_simple_responseus&
����rXr=cCs<|rd|vrtj��}|d|d<|�d�}|rt|�SdS)z=Determine if we have any encoding information in our headers.r:zcontent-type�charsetN)�email�message�Message�	get_paramr7)r=�mrYr(r(r)�_get_encoding_from_headers�s

r_c@s6eZdZddd�Zdedefdd	�Zdefd
d�ZdS)
�CacheablePageContent�page�IndexContentr"NcCs|jsJ�||_dSr.)�cache_link_parsingra�r1rar(r(r)r0�s

zCacheablePageContent.__init__�othercCst|t|��o|jj|jjkSr.)�
isinstance�typerar!)r1rer(r(r)�__eq__�szCacheablePageContent.__eq__cCst|jj�Sr.)�hashrar!�r1r(r(r)�__hash__�szCacheablePageContent.__hash__)rarbr"N)	r4r5r6r0�object�boolrh�intrkr(r(r(r)r`�s
r`c@s"eZdZdddeefdd�ZdS)�
ParseLinksrarbr"cCsdSr.r(rdr(r(r)�__call__�szParseLinks.__call__N)r4r5r6rrrpr(r(r(r)ro�sro�fncsPtjdd�dtdttf�fdd���t���dddttf��fd	d
��}|S)z�
    Given a function that parses an Iterable[Link] from an IndexContent, cache the
    function's result (keyed by CacheablePageContent), unless the IndexContent
    `page` has `page.cache_link_parsing == False`.
    N)�maxsize�cacheable_pager"cst�|j��Sr.)�listra)rs)rqr(r)�wrapper�sz*with_cached_index_content.<locals>.wrapperrarbcs|jr	�t|��St�|��Sr.)rcr`rt)ra�rqrur(r)�wrapper_wrapper�sz2with_cached_index_content.<locals>.wrapper_wrapper)�	functools�	lru_cacher`rr�wraps)rqrwr(rvr)�with_cached_index_content�s

r{rarbc
cs��|j��}|�d�r+t�|j�}|�dg�D]}t�||j	�}|dur%q|VqdSt
|j	�}|jp4d}|�|j�
|��|j	}|jpE|}|jD]}	tj|	||d�}|durXqI|VqIdS)z\
    Parse a Simple API's Index Content, and yield its anchor elements as Link objects.
    r<�filesNzutf-8)�page_url�base_url)r,r$r%�json�loads�contentr>r�	from_jsonr!�HTMLLinkParser�encoding�feed�decoder~�anchors�from_element)
rarA�data�file�link�parserr�r!r~�anchorr(r(r)�parse_links�s*�





�r�c
@sHeZdZdZ	ddededeedededd	fd
d�Zdefdd
�Z	d	S)rbz5Represents one response (or page), along with its URLTr�r,r�r!rcr"NcCs"||_||_||_||_||_dS)am
        :param encoding: the encoding to decode the given content.
        :param url: the URL from which the HTML was downloaded.
        :param cache_link_parsing: whether links parsed from this page's url
                                   should be cached. PyPI index urls should
                                   have this set to False, for example.
        N)r�r,r�r!rc)r1r�r,r�r!rcr(r(r)r0s

zIndexContent.__init__cCs
t|j�Sr.)rr!rjr(r(r)�__str__s
zIndexContent.__str__�T)
r4r5r6�__doc__�bytesr7rrmr0r�r(r(r(r)rbs"������
�csveZdZdZdeddf�fdd�Zdedeeeeefddfd	d
�Z	deeeeefdeefdd�Z
�ZS)
r�zf
    HTMLParser that keeps the first base HREF and a list of all anchor
    elements' attributes.
    r!r"Ncs$t�jdd�||_d|_g|_dS)NT)�convert_charrefs)r/r0r!r~r�)r1r!r2r(r)r0#s
zHTMLLinkParser.__init__�tag�attrscCsR|dkr|jdur|�|�}|dur||_dSdS|dkr'|j�t|��dSdS)N�base�a)r~�get_hrefr��append�dict)r1r�r��hrefr(r(r)�handle_starttag*s

��zHTMLLinkParser.handle_starttagcCs"|D]\}}|dkr|SqdS)Nr�r()r1r��name�valuer(r(r)r�2s
�zHTMLLinkParser.get_href)r4r5r6r�r7r0rr
rr�r�r8r(r(r2r)r�s
&.r�r��reason�meth).NcCs|durtj}|d||�dS)Nz%Could not fetch URL %s: %s - skipping)rUrV)r�r�r�r(r(r)�_handle_get_simple_fail9sr�TrccCs&t|j�}t|j|jd||j|d�S)Nr:)r�r!rc)r_r=rbr�r!)r9rcr�r(r(r)�_make_index_contentCs
�r�c

Cs�|j�dd�d}t|�}|rt�d||�dStj�|�\}}}}}}|dkrHtj	�
tj�|��rH|�
d�s;|d7}tj�|d�}t�d|�zt||d	�}Wn�ty`t�d
|�YdSty|}zt�d||j|j�WYd}~dSd}~wty�}zt||�WYd}~dSd}~wty�}zt||�WYd}~dSd}~wty�}zd}	|	t|�7}	t||	tjd
�WYd}~dSd}~wtjy�}zt|d|���WYd}~dSd}~wtjy�t|d�YdSwt||jd�S)N�#rrzICannot look at %s URL %s because it does not support lookup as web pages.r��/z
index.htmlz# file: URL is directory, getting %srRz`Skipping page %s because it looks like an archive, and cannot be checked by a HTTP HEAD request.z�Skipping page %s because the %s request got Content-Type: %s. The only supported Content-Types are application/vnd.pypi.simple.v1+json, application/vnd.pypi.simple.v1+html, and text/htmlz4There was a problem confirming the ssl certificate: )r�zconnection error: z	timed out)rc) r!�splitr*rU�warningrHrI�urlparse�osrM�isdirr?�url2pathname�endswith�urljoinrVrXrCr+r-r,rr�rrr7�infor�ConnectionError�Timeoutr�rc)
r�rDr!�
vcs_schemer'�_rMrP�excr�r(r(r)�_get_index_contentPsn�
�����
���������r�c@s.eZdZUeeeed<eeeed<dS)�CollectedSources�
find_links�
index_urlsN)r4r5r6rrr�__annotations__r(r(r(r)r��s
r�c
@s�eZdZdZdededdfdd�Ze	dded	ed
e	ddfdd��Z
edee
fd
d��Zdedeefdd�Zde
dedefdd�ZdS)�
LinkCollectorz�
    Responsible for collecting Link objects from all configured locations,
    making network requests as needed.

    The class's main method is its collect_sources() method.
    rD�search_scoper"NcCs||_||_dSr.)r�rD)r1rDr�r(r(r)r0�s
zLinkCollector.__init__F�options�suppress_no_indexcCsd|jg|j}|jr|st�dd�dd�|D���g}|jp g}tj|||jd�}t	||d�}|S)z�
        :param session: The Session to use to make requests.
        :param suppress_no_index: Whether to ignore the --no-index option
            when constructing the SearchScope object.
        zIgnoring indexes: %s�,css�|]}t|�VqdSr.r)�.0r!r(r(r)�	<genexpr>�s�z'LinkCollector.create.<locals>.<genexpr>)r�r��no_index)rDr�)
�	index_url�extra_index_urlsr�rUrVrWr�r�creater�)�clsrDr�r�r�r�r��link_collectorr(r(r)r��s$
�
��zLinkCollector.createcCs|jjSr.)r�r�rjr(r(r)r��szLinkCollector.find_links�locationcCst||jd�S)z>
        Fetch an HTML page containing package links.
        rR)r�rD)r1r�r(r(r)�fetch_response�szLinkCollector.fetch_response�project_name�candidates_from_pagecs�t���fdd��j�|�D����}t���fdd��jD����}t�tj	�rIdd�t
�||�D�}t|��d|�d�g|}t�
d�|��tt|�t|�d	�S)
Nc3�&�|]}t|��jjddd�VqdS)F�r��page_validator�
expand_dirrcN�rrD�is_secure_origin�r��loc�r�r1r(r)r������
�z0LinkCollector.collect_sources.<locals>.<genexpr>c3r�)Tr�Nr�r�r�r(r)r��r�cSs*g|]}|dur|jdurd|j���qS)Nz* )r�)r��sr(r(r)�
<listcomp>�s

�z1LinkCollector.collect_sources.<locals>.<listcomp>z' location(s) to search for versions of �:�
)r�r�)�collections�OrderedDictr��get_index_urls_locations�valuesr�rU�isEnabledFor�logging�DEBUG�	itertools�chainr&rVrWr�rt)r1r�r��index_url_sources�find_links_sources�linesr(r�r)�collect_sources�s2
�	�
�	�
�
����zLinkCollector.collect_sources)F)r4r5r6r�rrr0�classmethodrrmr��propertyrr7r�rrrbr�rr�r�r(r(r(r)r��s<��
������!���r�r.r�)Rr�r��
email.messagerZrxr�rr�r��urllib.parserH�urllib.request�html.parserr�optparser�typingrrrrrr	r
rrr
r�pip._vendorrZpip._vendor.requestsrZpip._vendor.requests.exceptionsrr�pip._internal.exceptionsr�pip._internal.models.linkr�!pip._internal.models.search_scoper�pip._internal.network.sessionr�pip._internal.network.utilsr�pip._internal.utils.filetypesr�pip._internal.utils.miscr�pip._internal.vcsr�sourcesrrrr rl�	getLoggerr4rUr7�ResponseHeadersr*�	Exceptionr+rBrCrQrXr_r`ror{r�rbr�r�rmr�r�r�r�r(r(r(r)�<module>s�4
?��
�
�
����
�
=