py-data-juicer

View on PyPIReverse Dependencies (0)

0.2.0 py_data_juicer-0.2.0-py3-none-any.whl

Wheel Details

Project: py-data-juicer
Version: 0.2.0
Filename: py_data_juicer-0.2.0-py3-none-any.whl
Download: [link]
Size: 231746
MD5: 854bd75bf40a40d5f5c86b251879ee52
SHA256: 28a93f05888ea66562d5b21933de4cf85c74e84c9938b0371bc59323fef480cd
Uploaded: 2024-03-08 01:26:21 +0000

dist-info

METADATA

Metadata-Version: 2.1
Name: py-data-juicer
Version: 0.2.0
Summary: A One-Stop Data Processing System for Large Language Models.
Author: SysML Team of Alibaba Tongyi Lab
Home-Page: https://github.com/alibaba/data-juicer
License: Apache License 2.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Dist: fsspec (==2023.5.0)
Requires-Dist: pyarrow (<=12.0.0)
Requires-Dist: pandas (==2.0.0)
Requires-Dist: datasets (==2.11.0)
Requires-Dist: av
Requires-Dist: soundfile
Requires-Dist: librosa
Requires-Dist: loguru
Requires-Dist: tabulate
Requires-Dist: tqdm
Requires-Dist: jsonargparse[signatures]
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: emoji (==2.2.0)
Requires-Dist: regex
Requires-Dist: requests
Requires-Dist: wget
Requires-Dist: zstandard
Requires-Dist: lz4
Requires-Dist: pdfplumber
Requires-Dist: plotly
Requires-Dist: python-docx
Requires-Dist: streamlit
Requires-Dist: spacy (==3.5.0)
Requires-Dist: multiprocess (==0.70.12)
Requires-Dist: dill (==0.3.4)
Requires-Dist: fsspec (==2023.5.0); extra == "all"
Requires-Dist: pyarrow (<=12.0.0); extra == "all"
Requires-Dist: pandas (==2.0.0); extra == "all"
Requires-Dist: datasets (==2.11.0); extra == "all"
Requires-Dist: av; extra == "all"
Requires-Dist: soundfile; extra == "all"
Requires-Dist: librosa; extra == "all"
Requires-Dist: loguru; extra == "all"
Requires-Dist: tabulate; extra == "all"
Requires-Dist: tqdm; extra == "all"
Requires-Dist: jsonargparse[signatures]; extra == "all"
Requires-Dist: matplotlib; extra == "all"
Requires-Dist: seaborn; extra == "all"
Requires-Dist: emoji (==2.2.0); extra == "all"
Requires-Dist: regex; extra == "all"
Requires-Dist: requests; extra == "all"
Requires-Dist: wget; extra == "all"
Requires-Dist: zstandard; extra == "all"
Requires-Dist: lz4; extra == "all"
Requires-Dist: pdfplumber; extra == "all"
Requires-Dist: plotly; extra == "all"
Requires-Dist: python-docx; extra == "all"
Requires-Dist: streamlit; extra == "all"
Requires-Dist: spacy (==3.5.0); extra == "all"
Requires-Dist: multiprocess (==0.70.12); extra == "all"
Requires-Dist: dill (==0.3.4); extra == "all"
Requires-Dist: easyocr; extra == "all"
Requires-Dist: fasttext-wheel; extra == "all"
Requires-Dist: kenlm; extra == "all"
Requires-Dist: sentencepiece; extra == "all"
Requires-Dist: scipy; extra == "all"
Requires-Dist: ftfy; extra == "all"
Requires-Dist: simhash-pybind; extra == "all"
Requires-Dist: selectolax; extra == "all"
Requires-Dist: nlpaug; extra == "all"
Requires-Dist: nlpcda; extra == "all"
Requires-Dist: nltk; extra == "all"
Requires-Dist: transformers (>=4.37); extra == "all"
Requires-Dist: transformers-stream-generator; extra == "all"
Requires-Dist: einops; extra == "all"
Requires-Dist: accelerate; extra == "all"
Requires-Dist: tiktoken; extra == "all"
Requires-Dist: opencc (==1.1.6); extra == "all"
Requires-Dist: imagededup; extra == "all"
Requires-Dist: torch; extra == "all"
Requires-Dist: torchaudio; extra == "all"
Requires-Dist: dlib; extra == "all"
Requires-Dist: spacy-pkuseg (==0.0.32); extra == "all"
Requires-Dist: diffusers; extra == "all"
Requires-Dist: simple-aesthetics-predictor; extra == "all"
Requires-Dist: scenedetect[opencv]; extra == "all"
Requires-Dist: ffmpeg-python; extra == "all"
Requires-Dist: ray (==2.9.2); extra == "all"
Requires-Dist: pre-commit; extra == "all"
Requires-Dist: sphinx; extra == "all"
Requires-Dist: sphinx-autobuild; extra == "all"
Requires-Dist: sphinx-rtd-theme; extra == "all"
Requires-Dist: recommonmark; extra == "all"
Requires-Dist: fire; extra == "all"
Requires-Dist: jsonlines; extra == "all"
Requires-Dist: pyspark; extra == "all"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: sphinx; extra == "dev"
Requires-Dist: sphinx-autobuild; extra == "dev"
Requires-Dist: sphinx-rtd-theme; extra == "dev"
Requires-Dist: recommonmark; extra == "dev"
Requires-Dist: ray (==2.9.2); extra == "dist"
Requires-Dist: fsspec (==2023.5.0); extra == "mini"
Requires-Dist: pyarrow (<=12.0.0); extra == "mini"
Requires-Dist: pandas (==2.0.0); extra == "mini"
Requires-Dist: datasets (==2.11.0); extra == "mini"
Requires-Dist: av; extra == "mini"
Requires-Dist: soundfile; extra == "mini"
Requires-Dist: librosa; extra == "mini"
Requires-Dist: loguru; extra == "mini"
Requires-Dist: tabulate; extra == "mini"
Requires-Dist: tqdm; extra == "mini"
Requires-Dist: jsonargparse[signatures]; extra == "mini"
Requires-Dist: matplotlib; extra == "mini"
Requires-Dist: seaborn; extra == "mini"
Requires-Dist: emoji (==2.2.0); extra == "mini"
Requires-Dist: regex; extra == "mini"
Requires-Dist: requests; extra == "mini"
Requires-Dist: wget; extra == "mini"
Requires-Dist: zstandard; extra == "mini"
Requires-Dist: lz4; extra == "mini"
Requires-Dist: pdfplumber; extra == "mini"
Requires-Dist: plotly; extra == "mini"
Requires-Dist: python-docx; extra == "mini"
Requires-Dist: streamlit; extra == "mini"
Requires-Dist: spacy (==3.5.0); extra == "mini"
Requires-Dist: multiprocess (==0.70.12); extra == "mini"
Requires-Dist: dill (==0.3.4); extra == "mini"
Requires-Dist: easyocr; extra == "sci"
Requires-Dist: fasttext-wheel; extra == "sci"
Requires-Dist: kenlm; extra == "sci"
Requires-Dist: sentencepiece; extra == "sci"
Requires-Dist: scipy; extra == "sci"
Requires-Dist: ftfy; extra == "sci"
Requires-Dist: simhash-pybind; extra == "sci"
Requires-Dist: selectolax; extra == "sci"
Requires-Dist: nlpaug; extra == "sci"
Requires-Dist: nlpcda; extra == "sci"
Requires-Dist: nltk; extra == "sci"
Requires-Dist: transformers (>=4.37); extra == "sci"
Requires-Dist: transformers-stream-generator; extra == "sci"
Requires-Dist: einops; extra == "sci"
Requires-Dist: accelerate; extra == "sci"
Requires-Dist: tiktoken; extra == "sci"
Requires-Dist: opencc (==1.1.6); extra == "sci"
Requires-Dist: imagededup; extra == "sci"
Requires-Dist: torch; extra == "sci"
Requires-Dist: torchaudio; extra == "sci"
Requires-Dist: dlib; extra == "sci"
Requires-Dist: spacy-pkuseg (==0.0.32); extra == "sci"
Requires-Dist: diffusers; extra == "sci"
Requires-Dist: simple-aesthetics-predictor; extra == "sci"
Requires-Dist: scenedetect[opencv]; extra == "sci"
Requires-Dist: ffmpeg-python; extra == "sci"
Requires-Dist: fire; extra == "tools"
Requires-Dist: jsonlines; extra == "tools"
Requires-Dist: pyspark; extra == "tools"
Requires-Dist: wget; extra == "tools"
Provides-Extra: all
Provides-Extra: dev
Provides-Extra: dist
Provides-Extra: mini
Provides-Extra: sci
Provides-Extra: tools
Description-Content-Type: text/markdown
License-File: LICENSE
[Description omitted; length: 23840 characters]

WHEEL

Wheel-Version: 1.0
Generator: bdist_wheel (0.38.4)
Root-Is-Purelib: true
Tag: py3-none-any

RECORD

Path Digest Size
data_juicer/__init__.py sha256=bOwGh1UT6v5brQqXUD1fYMC8bns5WEwg83DtShmwS88 2063
data_juicer/analysis/__init__.py sha256=CJRFx62f1uCpAWZRhVRCCbk8Ab_FZYQVD-IGpVEGcRk 99
data_juicer/analysis/collector.py sha256=5n8HzhVD1rp8_QxtTFpHR1GGofqjxdN0jTIfZu58pfQ 2394
data_juicer/analysis/column_wise_analysis.py sha256=63VAkDGIjGsCd_B9Ezmw26gpLdwO4hbPKunI9LRgF5s 10978
data_juicer/analysis/diversity_analysis.py sha256=uz_xvrZpLEnweQpqk-aM5oUG2RnIGCkgMwgA58GO4Q4 6125
data_juicer/analysis/draw.py sha256=OQOgrf2IgXqBmVqh1dzv_jlXU_U4NVd9nqjUFoxOhfo 1226
data_juicer/analysis/measure.py sha256=Vp_-TgyLqaJDWy9w8BvmzHAhkcyeMxCYJWSOa6-q1z8 3032
data_juicer/analysis/overall_analysis.py sha256=Cr-JPS6nuQSyNezKkQuCmV_jT9z1sJjVVpcUAC-2U3k 3555
data_juicer/config/__init__.py sha256=y9w17TyxBnOnkuJn86xr1H2tWyYQ_xp82pY8EM6Uhnw 41
data_juicer/config/config.py sha256=fIudQWIIZt-otjWEDvXmvGL74Lm0LsJPbi6o4RXKmB4 25622
data_juicer/core/__init__.py sha256=gnJL_9FL93W0NS1I28yTRs3whThx8vcjqna3HlpGP-A 152
data_juicer/core/analyser.py sha256=YOBG7U9Bee9FMhf9TMXpheupBqlt73Y6aAhWxBA96v4 5650
data_juicer/core/data.py sha256=QYvs1o5s_olajI5s0KK9QIgj29tUCmxW2iBaeR0Q4Eo 12897
data_juicer/core/executor.py sha256=eIdZdJtT6kVVfrzKRJe8346fDqXFCkr-GRqXMR7VS_E 9864
data_juicer/core/exporter.py sha256=afHVEtkNXHP3I-DJuWwD7lyWwYTDusNCpg7E8ZVswLI 10462
data_juicer/core/ray_executor.py sha256=nfqsHkPNjQYHEMzKqlahmxt6uBbRZZXkM5b2rLEUqZk 5417
data_juicer/core/tracer.py sha256=4-BVz80f7Hf1hVdcASFyarZ0hOiW_Jv7-iNxoYpQoqU 9063
data_juicer/format/__init__.py sha256=Me--0Z33F8NOHeskbuoEgu2Ajt4r-OUsEcrqWQlaRiM 163
data_juicer/format/csv_formatter.py sha256=83kjrH-NjAz5qh0U7d4-DxryvDwu8mVdSv_myrDk7g0 726
data_juicer/format/formatter.py sha256=o0gip0iAwUCOIO-p28C8sHR6rb1EVKItZcvIHlJOvgU 12188
data_juicer/format/json_formatter.py sha256=BqgiKV9qKsDfe3-3ke-rEv338NrV5arPtZu7_uUrIrc 779
data_juicer/format/load.py sha256=FdUa6R62LT7GpkqpzIsfCrrpN21Rmsq4VjLNG-N0Q7E 1050
data_juicer/format/mixture_formatter.py sha256=t-kNY0quSacR6U8Huh1xWPwxzyTLoHf_uYnjTMPjdNA 5526
data_juicer/format/parquet_formatter.py sha256=mSCiC5lTSG8ZuneeTKZlLU5jqf_L27lt3mTfT5OxaDE 746
data_juicer/format/text_formatter.py sha256=58XZ0B2fwVoihr22ajYw_ANvt4nnPO1XhIWm1PlayT4 6340
data_juicer/format/tsv_formatter.py sha256=estI4SRpbW6tFMD4fibMss0WXpxwpIiLyWD77b15GBc 778
data_juicer/ops/__init__.py sha256=vxsZh3si5CnP7Zgh4xDY6v5izoDMtYE7sTFWVd7K-7Y 151
data_juicer/ops/base_op.py sha256=PciyahrV2r8cy0kiYVNRr1TnAvciHhM7UMDxDoclNuU 7887
data_juicer/ops/load.py sha256=VnmKfE2Z9dNRzdeP7YIr8dJKvbBIRWy4G7ViF-dQBME 1028
data_juicer/ops/op_fusion.py sha256=mb2vtwLQ4TTKSruzKrr1kwo_ASoxC2RjsXk30gmNvP8 5745
data_juicer/ops/common/__init__.py sha256=lkkX1X9Q_hqb3Pwvy074jLYsIcJ6BW2zaJfUyLAsIcY 341
data_juicer/ops/common/helper_func.py sha256=7i29FtczLFxcLOWe2zjW87yXQVH3ROaVrrqRqObZYh4 6450
data_juicer/ops/common/special_characters.py sha256=K-lAmnWmQiLh5rVKA5VJ6MlZN_tgb6E8JLgfIpYZhU0 1387
data_juicer/ops/deduplicator/__init__.py sha256=CIXFfgS2TDXeH0CK05zE8KAIK0KEJv9J0iPBCF-KCpI 170
data_juicer/ops/deduplicator/document_deduplicator.py sha256=JD2tiwKosLSIHHoUDuznR_eKt5wQsqhhlkWyna98qEY 3884
data_juicer/ops/deduplicator/document_minhash_deduplicator.py sha256=UC4ZW4dZp6Qy1RXGZKDupKYSStzvumHc3TSuzGRn0-Y 11698
data_juicer/ops/deduplicator/document_simhash_deduplicator.py sha256=zf3MvsKwlSd2ld9lCaEcdTqaplYbClfgx8A1EnV8a-w 8468
data_juicer/ops/deduplicator/image_deduplicator.py sha256=Wtghzr9lYAvzji3C_ZmqH0i7I3ilF_iqn2HZnw11Ip0 3992
data_juicer/ops/deduplicator/video_deduplicator.py sha256=j4h_Zy7ryF5KopMc4Uhee8B9MjtmvZ4p6E4ORRRjT-E 3557
data_juicer/ops/filter/__init__.py sha256=08roRIQrZScl3Cuxg9_EiCBTxNMd9r832XrzPFwu4LE 1162
data_juicer/ops/filter/alphanumeric_filter.py sha256=kVCqw91XCcsvetlAR8IfibOTvd0gx8fAzBVwGIHbfZA 3418
data_juicer/ops/filter/audio_duration_filter.py sha256=DHd0UrDhAb0WTYg6CRB_YL3sTap-4rFeZKuTUGkLA5I 3187
data_juicer/ops/filter/audio_nmf_snr_filter.py sha256=medaCnoYbHbEuIpUWljDuRRH3x6agVw4OZlGCSMpv98 4689
data_juicer/ops/filter/audio_size_filter.py sha256=ud0ydptxcwaXoqxIuhFS4MSfLVREXYwac3SOG42RLEE 2646
data_juicer/ops/filter/average_line_length_filter.py sha256=IP0c-A6yaQll7Oui9j99m3VhsMRjRUCtESgxrDQu3gw 2043
data_juicer/ops/filter/character_repetition_filter.py sha256=e-zM-NQVMIKDcwm8LB2F1m5IJCOeFn1biSWgEKMeUew 2861
data_juicer/ops/filter/face_area_filter.py sha256=ZNxFeaIKm_4Do8z-sMI02V0ALXDoGJ9ALSDIwR1Pkuw 4362
data_juicer/ops/filter/flagged_words_filter.py sha256=5Gv96PH09FXMK8TB2ynjpR0Smi3hiNmTV3DELvYp1QE 5119
data_juicer/ops/filter/image_aesthetics_filter.py sha256=Xe6fHlOTbzmHPbKnUlnFZ9ddGKVIWQhI1U3U1Aj9i5w 4856
data_juicer/ops/filter/image_aspect_ratio_filter.py sha256=ISE5Ljjc3tIm1o5GCwaferXqebKFDS9MOBDxJzoqBtA 3029
data_juicer/ops/filter/image_shape_filter.py sha256=9TC_BNo-IOQ8lTlbVVBbyFfAoaVEQwvcy6q_3QMHBHM 3569
data_juicer/ops/filter/image_size_filter.py sha256=BVsNFDvQfSQgwhT8QAgEjJJ39uTTZFadfVn_XDplOWk 2646
data_juicer/ops/filter/image_text_matching_filter.py sha256=S8cnE01ubmzOMa5uKLTzVqD_cAOQw38WqEJFBGsiqxs 6357
data_juicer/ops/filter/image_text_similarity_filter.py sha256=333WsLPjs-EuosSAI9R8hZm-eHTPtMEpMuHq-oKe68A 6263
data_juicer/ops/filter/language_id_score_filter.py sha256=msdGH07eYI7k19k8icVG1PhMBz4ZX8X4TjNXu0M69lw 2672
data_juicer/ops/filter/maximum_line_length_filter.py sha256=qnt8NgWCP4N_YhCtQR8mP8oOeA2dZMLFEKeFPpOAx6g 2046
data_juicer/ops/filter/perplexity_filter.py sha256=UsduNUlrEET-tc9uebX_E1Yg8bxwF-6m4Z2020o1U6o 2913
data_juicer/ops/filter/phrase_grounding_recall_filter.py sha256=R1yGb1HXe-n9ePhPhRT-9r7IdiDrt4ztFyuQdynKrMI 11368
data_juicer/ops/filter/special_characters_filter.py sha256=bH_eK9PC5nPe--BtaapPtMeLqFBQ5xrXseEQtq1-1p4 2005
data_juicer/ops/filter/specified_field_filter.py sha256=bQVjdFaoaLIgcoaFmBpKCvj0_BbEyxyiCkWHMAg4u3c 1804
data_juicer/ops/filter/specified_numeric_field_filter.py sha256=Rs9UVYYW92uKe5947q_J_SiUorhd_ZqTD-9dgO4rp6o 2158
data_juicer/ops/filter/stopwords_filter.py sha256=4sKc6Lo9JGehpJ-iBKd2pCcg16pkwQm-l0IP2OrTmKg 5023
data_juicer/ops/filter/suffix_filter.py sha256=9bofFTXaT6icHr8pCN79ZUTRlNv_ZQxJ30lIfWGvT6U 1168
data_juicer/ops/filter/text_action_filter.py sha256=CPN1vminElt_iRJahOgsteSLBIUP4n6r_CuP_5iYNsY 2280
data_juicer/ops/filter/text_entity_dependency_filter.py sha256=pqCm6Xsbyr0vwsw7Lmd3cfYgPbpUccPZulZO0OQIi1s 3821
data_juicer/ops/filter/text_length_filter.py sha256=T1qoCxNsh6iVFa523GI5D2xbXuogljDm0avKKPACilE 1493
data_juicer/ops/filter/token_num_filter.py sha256=sZcSx3a6jfTUn6VMTPFcoYV0AJka7DVQr0S-O04Z7Qs 2321
data_juicer/ops/filter/video_aesthetics_filter.py sha256=oufyTZBW_ZDd8-ckGwuDy9WPVhmJ_cht5sol_y1dkmA 8112
data_juicer/ops/filter/video_aspect_ratio_filter.py sha256=pNIWhzTVeQbJhC1AJSPxpZmIVaa13KaRjJUp4NVKsPw 3482
data_juicer/ops/filter/video_duration_filter.py sha256=JkRWhb95IyED-0bNOquBky6Ml5k3aU-6n6WTo1eco78 3323
data_juicer/ops/filter/video_frames_text_similarity_filter.py sha256=Zka2HCneUq1a7yk6TUQ9ONhiRw6GPZDT5BFGrZytcvg 8743
data_juicer/ops/filter/video_motion_score_filter.py sha256=QRpDuJjLRql6hWjEno1wSUfjT1_7NIOEKnVSx15I9QE 5590
data_juicer/ops/filter/video_ocr_area_ratio_filter.py sha256=1V_e4ZKKAX5fNQtpct5q25DgatIbjwjZKM_2oHeea10 7082
data_juicer/ops/filter/video_resolution_filter.py sha256=P9IuFz6jEo19baFSiPxHPfRyeLrQd5Q4K0151Nc1mTw 4046
data_juicer/ops/filter/word_num_filter.py sha256=DGB7OqjP6R1AsGO4hnTmJolVpFfD-Paep63tQ1TWaHA 2893
data_juicer/ops/filter/word_repetition_filter.py sha256=_skmbh5B2SZIjI1I8ievhdQWXrvGgSECN4hEdWXncGI 4513
data_juicer/ops/mapper/__init__.py sha256=sIGghnJIEXHliQn6ngWYC9DRyqtdazWGXtM_URLduY0 1395
data_juicer/ops/mapper/audio_ffmpeg_wrapped_mapper.py sha256=dN4F71522VsCQh6bJek75Q1i9UVG8fF73WyoKRP7vkQ 2659
data_juicer/ops/mapper/chinese_convert_mapper.py sha256=woCzQJOfzA7k-EzgubbD7jy9DwYXlLxvvFK9LUchkSI 2355
data_juicer/ops/mapper/clean_copyright_mapper.py sha256=U48slQZwtG4y1UpOl0aDYhEvDFug1GcSbY-FFiM7CEU 1766
data_juicer/ops/mapper/clean_email_mapper.py sha256=qJRpibclSp7DwU1IhZzh1SpozZ1VQ0wRanKwYbB6--I 1382
data_juicer/ops/mapper/clean_html_mapper.py sha256=aDP7QVUQh88AXt2T7-kyQ4EAuRij0S24ucJKuukmGeE 1183
data_juicer/ops/mapper/clean_ip_mapper.py sha256=yt6neSJtCXZ-2_sQf6HkjVClSSZ8Tiqfk5pLPLkQeXE 1682
data_juicer/ops/mapper/clean_links_mapper.py sha256=aIbgtaKdafyW94nC73awuvqiAOi08_a7V06flNm3OAc 1983
data_juicer/ops/mapper/expand_macro_mapper.py sha256=ofu9PbGJpgSs18hE2okv4KiX7Oi8Ut62toPURwHg-V0 3159
data_juicer/ops/mapper/fix_unicode_mapper.py sha256=kj0NyDvDFQH-N12vIgYnNRqMJsyD7RjROofYnOJPctE 1384
data_juicer/ops/mapper/generate_caption_mapper.py sha256=HSKyvUJMvOQN8_ElVjnojw5_kMSokZNWwDRv_2ijYO0 13663
data_juicer/ops/mapper/gpt4v_generate_mapper.py sha256=7Nlgd8dnP6iv8yU3NPtoBFxpx3PJIBzlCwy5y56jkI4 13331
data_juicer/ops/mapper/image_blur_mapper.py sha256=2oy49P7R1SQYcLCe80pjups9GTewCCX5CPLIjENxCmQ 2952
data_juicer/ops/mapper/image_captioning_from_gpt4v_mapper.py sha256=nRasr9mhpRsQ4aGwD1NiM53p4uLN8KLPzM1mJgbQXrc 13368
data_juicer/ops/mapper/image_captioning_mapper.py sha256=SxtY_Sci7N9PfxpUzSWvymKOil2sNA8DbYCJj0FYmzg 13633
data_juicer/ops/mapper/image_diffusion_mapper.py sha256=KX4sbhFg5No4_Rz7h3ma_PcS7sYA0q9KMepDMzDUOoA 10156
data_juicer/ops/mapper/nlpaug_en_mapper.py sha256=jFVWCtbagfw9VD7T-OAc3J289ft9F4CaDcLIzgBxthw 6843
data_juicer/ops/mapper/nlpcda_zh_mapper.py sha256=zEOXqw48qagidXA5TvcHvfVP4_USTWavdKH2BHEUYqc 7970
data_juicer/ops/mapper/punctuation_normalization_mapper.py sha256=0qIxEW1FnDBVjC03k4mXH2ZEtZTExgZuTdGW2xVudbQ 1662
data_juicer/ops/mapper/remove_bibliography_mapper.py sha256=fMtCoDIo0vwyLvdm2Rj4JZEdzc5t8f_h6nf2-X6ewnc 1180
data_juicer/ops/mapper/remove_comments_mapper.py sha256=axNdupJkpJB8kX2-vkdxrfiXSBwa9ETMnI2ApvdW0Ac 1844
data_juicer/ops/mapper/remove_header_mapper.py sha256=P56WaWhICzotWnRXrEvA0XotzzsvtDPM0kkLsHQjbII 1795
data_juicer/ops/mapper/remove_long_words_mapper.py sha256=xfO7yP_aQ5uZPUIQr9USvjIniaLUOjJOPtj1Bkebz_U 1910
data_juicer/ops/mapper/remove_non_chinese_character_mapper.py sha256=mRdplL2oUJZdDQp8mjg8W8obq-zlWtf4oaf479tLUn0 1497
data_juicer/ops/mapper/remove_repeat_sentences_mapper.py sha256=GlkD3Tj10bPVJYP-RijzRDwOpeahBWqaErVbf31iYLw 2687
data_juicer/ops/mapper/remove_specific_chars_mapper.py sha256=uB4RdXWEHyxC9mxOwQV1YRDIjZt5WeOQk54Grm4Xnao 1206
data_juicer/ops/mapper/remove_table_text_mapper.py sha256=uBDxF8p4tWMdlU8iyQ8mAPhh7uoSjhb6HOttnWB_Thc 1392
data_juicer/ops/mapper/remove_words_with_incorrect_substrings_mapper.py sha256=-FB6tLt4O43gy7xZ8LoeCxQ3QA5gqLa6VCrsTXdBPJw 2881
data_juicer/ops/mapper/replace_content_mapper.py sha256=YTRVSXQItA3_FoUDkMf9M_D_-ecmbc5ZdCErzTNgTAQ 1345
data_juicer/ops/mapper/sentence_split_mapper.py sha256=VxUJ73JvnOoXdEnGILEhsW8vMG0lRxUwpyvSvwQqJ34 1112
data_juicer/ops/mapper/video_captioning_from_audio_mapper.py sha256=nN7D061yHR16MnI69oWsR7-7iaEDwUyWgzlVCtpRvbw 5881
data_juicer/ops/mapper/video_captioning_from_video_mapper.py sha256=EDeGdVoKE2HlcxQAtKyOgOwSaqS-V5i-BGCsVgbsU3Y 16564
data_juicer/ops/mapper/video_ffmpeg_wrapped_mapper.py sha256=ySTHLdtH7O1bdzAu85jg-tihm9IurxS5CwvmvJ3jYFE 2659
data_juicer/ops/mapper/video_resize_aspect_ratio_mapper.py sha256=9PA2x2g3d7QHGZX6fviEzUm65vBmGA1UNBXdB8iE03E 5376
data_juicer/ops/mapper/video_resize_resolution_mapper.py sha256=YzsXlJP3VJ6CfYuC0RxhJ2qysFbBwliwvzrxK8q3RgM 7118
data_juicer/ops/mapper/video_split_by_duration_mapper.py sha256=jdDj9v4urxVf7R5sO6nYpzIqjCdBixB7tTUS9U2ut7U 6411
data_juicer/ops/mapper/video_split_by_key_frame_mapper.py sha256=crx3ipnjnhav2buNedyRa1K2HhZMqYsFcRpcxQ_tLGw 5644
data_juicer/ops/mapper/video_split_by_scene_mapper.py sha256=Sg4MCTw1gQtjFSrfaXzoG25p1lUV8YMgjWHPuQgg_HI 5476
data_juicer/ops/mapper/video_tagging_from_audio_mapper.py sha256=OQ8n0jLESe3Rg1qwBEb1PfflojQo9gxDDiFDKG_rmNQ 3201
data_juicer/ops/mapper/video_tagging_from_frames_mapper.py sha256=bsX0uRdEfVIeo1X-v1Tb8OZZR7H44RAcc1YfxspK378 4599
data_juicer/ops/mapper/whitespace_normalization_mapper.py sha256=Vl5by6pMvNZAW-3ZqeT0qkv_M4h9Efz7ATE4XmHRx6s 1155
data_juicer/ops/selector/__init__.py sha256=ozLhYXkJ4tmb_xP9-ddB4oww4ITCkwObIqFTjp5BMtM 80
data_juicer/ops/selector/frequency_specified_field_selector.py sha256=dTKsDj5zSBLWX5Xn5I2G2zY46nKdfSFgnHdbn-t6QwY 3430
data_juicer/ops/selector/topk_specified_field_selector.py sha256=A5Iw_fIkqUW2QglD09xN8pwvWr4uEEkHfMGbcfQEyDo 3562
data_juicer/tools/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
data_juicer/tools/analyze_data.py sha256=6ccddfV10a6sm-6D3PapXgU0SURlyiAfmPMLW1Szxvs 178
data_juicer/tools/process_data.py sha256=4ZTyFHN_wiqWe5b2S5nsAANR6U5DwWZ8UBBE7k0zS5M 428
data_juicer/utils/__init__.py sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU 0
data_juicer/utils/asset_utils.py sha256=ioJjRBdj_uZAxxKl3gU46BPzrXVH-DhtkcVsYy4kOfQ 2115
data_juicer/utils/availability_utils.py sha256=TThwjghyLNdrxHpUEBTtS_m8DduarRQJjtIKZtqktlc 4462
data_juicer/utils/cache_utils.py sha256=yqwkYZ5yFQJV-5iRcIDa5newXZmiQDD-g3i_S9T36a0 974
data_juicer/utils/ckpt_utils.py sha256=DkBaIfOl2GM2rWEpXU96Ep8-6y_NypOG4HGrRJ6aNFU 4492
data_juicer/utils/compress.py sha256=VgUEMynipjgFWJvQiL_l-xdnVgjyrHx18CGlWED2svU 17444
data_juicer/utils/constant.py sha256=LfwlTam7ck76ZtYfliWw6s9ktji16sFimyGzUPHwYbs 6659
data_juicer/utils/file_utils.py sha256=MgFWzjB1Nhll-SYhb2Xihoa2c_3hUfKmz1XBWabxBe8 5632
data_juicer/utils/fingerprint_utils.py sha256=-xcYNsUaCK04K5VZs8dEshI057zAp7_6rCl02NSLI38 6014
data_juicer/utils/logger_utils.py sha256=NWJ1D-pIvS5H9t39kiKo_0xaTMbOkRtdNybdjoqeA8M 4692
data_juicer/utils/mm_utils.py sha256=D9ne8p8Bm9_2YYYxUjH0NbyiABUagXW5JIiXFSU4OPA 24280
data_juicer/utils/model_utils.py sha256=vgD7jPQFnUBlo0AT0soT2kTbIfLSORliYb1iATqzfN0 20675
data_juicer/utils/registry.py sha256=m1OBz3rnuD0fr_QyEgkDXfN_e9WUcSb8xa5ndHVqzRE 4390
data_juicer/utils/unittest_utils.py sha256=XIgXla_rvVgztoyMribasFQsmuNLDyzILjozO20b19g 1238
py_data_juicer-0.2.0.dist-info/LICENSE sha256=IQxCSImw3L-Te-ST0B-_KQ4-i9bK04RF_vaGEWKJ95M 20905
py_data_juicer-0.2.0.dist-info/METADATA sha256=B2ARVXmxtaded4opd-sDzO79bfDHX0Hjk2kibofy0F0 30611
py_data_juicer-0.2.0.dist-info/WHEEL sha256=2wepM1nk4DS4eFpYrW1TTqPcoGNfHhhO_i5m4cOimbo 92
py_data_juicer-0.2.0.dist-info/entry_points.txt sha256=ROwQdhs4D_4_2EEr-NgJhG9wCDD9DxZNn4msOrBlwRk 116
py_data_juicer-0.2.0.dist-info/top_level.txt sha256=jnFDZyR-f002m7XGYK72vVjEDUmergqKFCKRG2Q3HqQ 12
py_data_juicer-0.2.0.dist-info/RECORD

top_level.txt

data_juicer

entry_points.txt

dj-analyze = data_juicer.tools.analyze_data:main
dj-process = data_juicer.tools.process_data:main