HEX
Server: Apache
System: Linux zacp120.webway.host 4.18.0-553.50.1.lve.el8.x86_64 #1 SMP Thu Apr 17 19:10:24 UTC 2025 x86_64
User: govancoz (1003)
PHP: 8.3.26
Disabled: exec,system,passthru,shell_exec,proc_close,proc_open,dl,popen,show_source,posix_kill,posix_mkfifo,posix_getpwuid,posix_setpgid,posix_setsid,posix_setuid,posix_setgid,posix_seteuid,posix_setegid,posix_uname
Upload Files
File: //opt/alt/python37/lib/python3.7/site-packages/charset_normalizer/__pycache__/cd.cpython-37.pyc
B

�bD+�@s�ddlZddlmZddlmZmZddlmZddlm	Z	m
Z
mZmZddl
mZddlmZmZmZmZdd	lmZdd
lmZddlmZmZmZmZmZee
ed�d
d�Zee
ed�dd�Z e�ee
ed�dd��Z!e�ee
ed�dd��Z"eed�eee#e#fd�dd��Z$d,e
ee#e
ed�dd�Z%ee
ee&d�dd �Z'ee
ed!�d"d#�Z(e
eed$�d%d&�Z)ed'd�d-ee&eeed)�d*d+��Z*dS).�N)�IncrementalDecoder)�Counter�OrderedDict)�	lru_cache)�Dict�List�Optional�Tuple�)�FREQUENCIES)�KO_NAMES�LANGUAGE_SUPPORTED_COUNT�TOO_SMALL_SEQUENCE�ZH_NAMES)� is_suspiciously_successive_range)�CoherenceMatches)�is_accentuated�is_latin�is_multi_byte_encoding�is_unicode_range_secondary�
unicode_range)�	iana_name�returncs�t|�rtd��t�d�|��j}|dd�}i�d�xltdd�D]^}|�t|g��}|r@t	|�}|dkrjq@t
|�d	kr@|�kr�d�|<�|d
7<�d
7�q@Wt��fdd��D��S)
zF
    Return associated unicode ranges in a single byte code page.
    z.Function not supported on multi-byte code pagezencodings.{}�ignore)�errorsr�@�NFr
cs g|]}�|�dkr|�qS)g333333�?�)�.0�character_range)�character_count�seen_rangesr�F/opt/alt/python37/lib/python3.7/site-packages/charset_normalizer/cd.py�
<listcomp>2sz*encoding_unicode_range.<locals>.<listcomp>)r�IOError�	importlib�
import_module�formatr�range�decode�bytesrr�sorted)r�decoder�p�i�chunkrr)r r!r"�encoding_unicode_ranges(
r0)�
primary_rangercCsDg}x:t��D].\}}x$|D]}t|�|kr|�|�PqWqW|S)z>
    Return inferred languages used with a unicode range.
    )r�itemsr�append)r1�	languages�language�
characters�	characterrrr"�unicode_range_languages9s


r8cCs>t|�}d}x|D]}d|kr|}PqW|dkr6dgSt|�S)z�
    Single-byte encoding language association. Some code page are heavily linked to particular language(s).
    This function does the correspondence.
    NZLatinzLatin Based)r0r8)rZunicode_rangesr1Zspecified_rangerrr"�encoding_languagesHs
r9cCsb|�d�s&|�d�s&|�d�s&|dkr,dgS|�d�s>|tkrFddgS|�d	�sX|tkr^d
gSgS)z�
    Multi-byte encoding language association. Some code page are heavily linked to particular language(s).
    This function does the correspondence.
    Zshift_�
iso2022_jpZeuc_j�cp932ZJapaneseZgbZChinesezClassical Chinese�
iso2022_krZKorean)�
startswithrr)rrrr"�mb_encoding_languages\s


r>)�maxsize)r5rcCsFd}d}x4t|D](}|s&t|�r&d}|rt|�dkrd}qW||fS)zg
    Determine main aspects from a supported language if it contains accents and if is pure Latin.
    FT)rrr)r5�target_have_accents�target_pure_latinr7rrr"�get_target_featuresqsrBF)r6�ignore_non_latinrcs�g}tdd��D��}xxt��D]l\}}t|�\}}|rB|dkrBq |dkrP|rPq t|�}t�fdd�|D��}	|	|}
|
dkr |�||
f�q Wt|dd�d	d
�}dd�|D�S)zE
    Return associated languages associated to given characters.
    css|]}t|�VqdS)N)r)rr7rrr"�	<genexpr>�sz%alphabet_languages.<locals>.<genexpr>Fcsg|]}|�kr|�qSrr)r�c)r6rr"r#�sz&alphabet_languages.<locals>.<listcomp>g�������?cSs|dS)Nr
r)�xrrr"�<lambda>��z$alphabet_languages.<locals>.<lambda>T)�key�reversecSsg|]}|d�qS)rr)rZcompatible_languagerrr"r#�s)�anyrr2rB�lenr3r+)r6rCr4Zsource_have_accentsr5Zlanguage_charactersr@rAr Zcharacter_match_count�ratior)r6r"�alphabet_languages�s rN)r5�ordered_charactersrcs6|tkrtd�|���d}�x|D�]}|t|kr6q"t|dt|�|��}t|t|�|�d�}|d|�|���||�|�d���fdd�|D��d�}�fdd�|D��d�}t|�dkr�|dkr�|d	7}q"t|�dkr�|dkr�|d	7}q"|t|�d
k�s|t|�d
kr"|d	7}q"q"W|t|�S)aN
    Determine if a ordered characters list (by occurrence from most appearance to rarest) match a particular language.
    The result is a ratio between 0. (absolutely no correspondence) and 1. (near perfect fit).
    Beware that is function is not strict on the match in order to ease the detection. (Meaning close match is 1.)
    z{} not availablerNcsg|]}|�k�qSrr)r�e)�characters_beforerr"r#�sz1characters_popularity_compare.<locals>.<listcomp>Tcsg|]}|�k�qSrr)rrP)�characters_afterrr"r#�s�r
g�������?)r�
ValueErrorr'�index�countrL)r5rOZcharacter_approved_countr7Zcharacters_before_sourceZcharacters_after_sourceZbefore_match_countZafter_match_countr)rRrQr"�characters_popularity_compare�s:rW)�decoded_sequencercCs�t�}x�|D]�}|��dkrqt|�}|dkr0qd}x |D]}t||�dkr:|}Pq:W|dkrb|}||krx|��||<q|||��7<qWt|���S)a
    Given a decoded text sequence, return a list of str. Unicode range / alphabet separation.
    Ex. a text containing English/Latin with a bit a Hebrew will return two items in the resulting list;
    One containing the latin letters and the other hebrew.
    FN)r�isalpharr�lower�list�values)rXZlayersr7rZlayer_target_rangeZdiscovered_rangerrr"�alpha_unicode_split�s(

r])�resultsrcspt��xD|D]<}x6|D].}|\}}|�kr6|g�|<q�|�|�qWqW�fdd��D�}t|dd�dd�S)z�
    This function merge results previously given by the function coherence_ratio.
    The return type is the same as coherence_ratio.
    cs.g|]&}|tt�|�t�|�d�f�qS)rS)�round�sumrL)rr5)�per_language_ratiosrr"r#sz*merge_coherence_ratios.<locals>.<listcomp>cSs|dS)Nr
r)rFrrr"rG"rHz(merge_coherence_ratios.<locals>.<lambda>T)rIrJ)rr3r+)r^�resultZ
sub_resultr5rM�merger)rar"�merge_coherence_ratios	s



rdi皙�����?)rX�	threshold�lg_inclusionrcCs�g}d}d}|dk	r|�d�ng}d|kr8d}|�d�x�t|�D]�}t|�}|��}	tdd�|	D��}
|
tkrrqBd	d
�|	D�}xZ|p�t||�D]H}t||�}
|
|kr�q�n|
dkr�|d7}|�	|t
|
d
�f�|dkr�Pq�WqBWt|dd�dd�S)z�
    Detect ANY language that can be identified in given sequence. The sequence will be analysed by layers.
    A layer = Character extraction by alphabets/ranges.
    FrN�,zLatin BasedTcss|]\}}|VqdS)Nr)rrE�orrr"rD<sz"coherence_ratio.<locals>.<genexpr>cSsg|]\}}|�qSrr)rrErirrr"r#Asz#coherence_ratio.<locals>.<listcomp>g�������?r
rS�cSs|dS)Nr
r)rFrrr"rGTrHz!coherence_ratio.<locals>.<lambda>)rIrJ)�split�remover]r�most_commonr`rrNrWr3r_r+)rXrfrgr^rCZsufficient_match_countZlg_inclusion_listZlayerZsequence_frequenciesrmr Zpopular_character_orderedr5rMrrr"�coherence_ratio%s4	

rn)F)reN)+r%�codecsr�collectionsrr�	functoolsr�typingrrrr	ZassetsrZconstantrr
rrZmdrZmodelsr�utilsrrrrr�strr0r8r9r>�boolrBrN�floatrWr]rdrnrrrr"�<module>s4	%
#:'