top of page
Download 100k Mixed Txt -
: A large-scale dataset for LLM-based web information extraction. It combines multilingual markdown/text content from real web pages with natural-language prompts and validated JSON responses.
: You can investigate sentiment classification or language identification in datasets that mix multiple languages (e.g., Hindi-English), which is a growing field in NLP. Download 100K mixed txt
: Specifically for manufacturing and 3D printing research, this dataset contains over 100,000 G-code files (a form of technical mixed text) along with their corresponding 3D models. Potential Research Directions : A large-scale dataset for LLM-based web information
bottom of page