twinkle-ai@taiwan:~$

Twinkle AI

繁體中文語言模型研究社群

Twinkle AI 是一個專注於繁體中文語言模型的研究社群，成立於 2024 年底。起步於開源 LLaMA 模型，逐步打造出專屬於繁體中文的實用技術。社群成員來自各行各業，因熱愛模型訓練而聚，致力於推廣大語言模型訓練知識，並透過開放合作推動臺灣生成式 AI 的發展。

👋 歡迎加入我們的 Discord，與社群夥伴一起交流與合作！

💬 加入 Discord 😺 GitHub 🤗 Hugging Face

Traditional Chinese LLM Research Community

Twinkle AI is a Traditional Chinese language model research community founded in late 2024. Starting with open-source LLaMA models, we've gradually built practical technologies specifically for Traditional Chinese. Our community members come from diverse professional backgrounds, united by our passion for model training, committed to promoting large language model training knowledge and advancing Taiwan's generative AI development through open collaboration.

👋 Welcome to join our Discord to collaborate and communicate with community members!

💬 Join Discord 😺 GitHub 🤗 Hugging Face

mascot.png

mission.md

我們的理念

社群旨在用實際行動推廣大語言模型的訓練知識，並希望透過開放與共享，推動臺灣在生成式 AI 領域的發展。

從第一原理出發

在 LLM 這個快速變遷的領域，我們不滿足於使用第三方套件和 API。我們鼓勵群友重造輪子，深入理解每個環節的底層原理，掌握最真實的技術。

培育本土人才

Twinkle AI 致力於培育台灣本土的 AI 人才，讓我們不會隨波逐流，避免成為只會呼叫工具的「人類 MCP」，而是真正掌握核心技術的開發者。

超越既有框架

我們相信能做出比早期套件更好更完善的解決方案。透過重新思考和實作，不僅解決問題，更為台灣累積寶貴的研發能量。

開源與共享

已開源 Twinkle Eval 評測框架、繁中訓練集文本以及繁中大語言模型，透過開放合作推動台灣生成式 AI 生態系的發展。

Our Philosophy

Our community aims to promote knowledge of large language model training through practical actions, and hopes to advance Taiwan's development in the generative AI field through openness and sharing.

First Principles Approach

In the rapidly evolving field of LLM, we're not satisfied with just using third-party packages and APIs. We encourage community members to reinvent the wheel, deeply understand the underlying principles of each component, and master the most authentic technology.

Cultivating Local Talent

Twinkle AI is committed to nurturing Taiwan's local AI talent, ensuring we don't just follow trends or become "human MCPs" that only call tools, but become developers who truly master core technologies.

Beyond Existing Frameworks

We believe we can create better and more comprehensive solutions than early packages. Through rethinking and implementation, we not only solve problems but also accumulate valuable R&D capabilities for Taiwan.

Open Source & Sharing

We've open-sourced Twinkle Eval evaluation framework, Traditional Chinese training datasets, and Traditional Chinese large language models, promoting Taiwan's generative AI ecosystem development through open collaboration.

projects.md

核心專案

為什麼要重造輪子？

在 LLM 這個新興領域，每一天每一刻許多東西都在快速轉變。如果只是用第三方的套件、函式庫或 API，則會難以習得最底層的運作原理。我們鼓勵從第一原理思考，打造更適合繁中的解決方案。

LLM Lab

繁中模型研究與實作的 Lab

完整的繁體中文大型語言模型研究環境，從語料收集、預處理、模型訓練，提供研究者深入理解每個環節的工具和方法。

模型訓練實驗工具

查看專案

Twinkle Eval

高效且準確的 AI 評測工具

從底層重新設計的繁體中文語言模型評測框架，不依賴第三方評測套件，提供更準確、更適合繁中的評測標準和方法。

評測框架性能分析基準測試

查看專案

Core Projects

Why reinvent the wheel?

In the emerging field of LLM, many things are changing rapidly every day and every moment. If we only use third-party packages, libraries, or APIs, it will be difficult to learn the underlying principles. We encourage thinking from first principles and building solutions that are more suitable for Traditional Chinese.

LLM Lab

Traditional Chinese Model Research & Implementation Lab

A complete Traditional Chinese large language model research environment, from corpus collection and preprocessing to model training, providing researchers with tools and methods to deeply understand each component.

Model Training Research Tools

View Project

Twinkle Eval

Efficient & Accurate AI Evaluation Tool

A Traditional Chinese language model evaluation framework redesigned from the ground up, independent of third-party evaluation packages, providing more accurate evaluation standards and methods suitable for Traditional Chinese.

Eval Framework Performance Analysis Benchmarking

View Project

models-and-datasets.yaml

模型與資料集

我們的研究成果已在 Hugging Face 上開源，包含推理模型、資料集和評測基準。

Models & Datasets

Our research results have been open-sourced on Hugging Face, including inference models, datasets, and evaluation benchmarks.

Formosa-1 Series

專注繁體中文指令遵循與邏輯推理的模型系列

Formosa-1 (F1) 推理模型集合，專為繁體中文指令遵循和邏輯推理而設計。這是我們從第一原理出發，針對繁中特性優化的旗艦模型系列。

旗艦模型指令遵循邏輯推理繁中優化

🤗 查看模型系列

Traditional Chinese Reasoning Datasets

繁體中文推理能力評測與訓練資料集

精心策劃的資料集合，用於評估和訓練各領域的繁體中文推理能力，涵蓋邏輯推理、常識推理等多個維度。

推理評測多領域

查看資料集

tw-leetcode

繁體中文 LeetCode 高效解法資料集

針對 LeetCode 題目的繁體中文資料集，包含高效能程式解法（Beats 100%）、完整解題思路，以及時間與空間複雜度分析。採用「Top Concept → Step Implement → Complexity Analysis」結構，便於理解程式邏輯推理過程。

高效能解法結構化思路複雜度分析

🤗 查看資料集

Eval Logs

Twinkle Eval 基準測試記錄

使用 Twinkle Eval 生成的基準測試日誌，記錄各種模型在每個提示上的輸出結果，提供透明的評測過程和結果追溯。

評測記錄透明化結果追溯

查看評測記錄

Formosa-1 Series

Flagship model series focused on Traditional Chinese instruction following and logical reasoning

Formosa-1 (F1) reasoning model collection, specifically designed for Traditional Chinese instruction following and logical reasoning. This is our flagship model series optimized for Traditional Chinese characteristics, built from first principles.

Flagship Model Instruction Following Logical Reasoning Traditional Chinese Optimized

🤗 View Model Series

Traditional Chinese Reasoning Datasets

Traditional Chinese reasoning evaluation and training datasets

Carefully curated datasets for evaluating and training Traditional Chinese reasoning capabilities across multiple domains, covering logical reasoning, commonsense reasoning, and other dimensions.

Reasoning Evaluation Multi-domain

View Datasets

tw-leetcode

Traditional Chinese LeetCode high-performance solutions dataset

Traditional Chinese dataset for LeetCode problems, featuring high-performance solutions (Beats 100%), complete problem-solving approaches, and time & space complexity analysis. Uses "Top Concept → Step Implement → Complexity Analysis" structure for better understanding of algorithmic reasoning processes.

High-Performance Solutions Structured Approach Complexity Analysis

🤗 View Dataset

Eval Logs

Twinkle Eval benchmark testing records

Benchmark testing logs generated using Twinkle Eval, recording various model outputs for each prompt, providing transparent evaluation processes and result traceability.

Evaluation Records Transparency Result Tracing

View Eval Records

quick-start.py

快速開始

# clone LLM Lab 實驗環境 git clone https://github.com/ai-twinkle/llm-lab.git

# 開始你的繁中 AI 研究之旅 🚀

立即加入我們的 Discord 社群，與其他開發者和研究者交流。獲取最新的模型更新、技術分享，以及參與開源項目的機會。

Quick Start

# clone LLM Lab research environment git clone https://github.com/ai-twinkle/llm-lab.git

# Start your Traditional Chinese AI research journey 🚀

Join our Discord community now to communicate with other developers and researchers. Get the latest model updates, technical sharing, and opportunities to participate in open source projects.

community.json

社群連結

💬 Discord 社群

即時交流與合作平台

😺 GitHub

開源代碼與項目倉庫

🤗 Hugging Face

模型與資料集發布平台

🔗 LinkedIn

專業網絡與合作

Community Links

💬 Discord Community

Real-time communication & collaboration

😺 GitHub

Open source code & project repository

🤗 Hugging Face

Models & datasets publishing platform

🔗 LinkedIn

Professional networking & collaboration