🇺🇸 EN
twinkle-ai@taiwan:~$
Twinkle AI

繁體中文語言模型研究社群

Twinkle AI Twinkle AI 是一個專注於繁體中文語言模型的研究社群,成立於 2024 年底。起步於開源 LLaMA 模型,逐步打造出專屬於繁體中文的實用技術。社群成員來自各行各業,因熱愛模型訓練而聚,致力於推廣大語言模型訓練知識,並透過開放合作推動臺灣生成式 AI 的發展。

👋 歡迎加入我們的 Discord,與社群夥伴一起交流與合作!

Traditional Chinese LLM Research Community

Twinkle AI Twinkle AI is a Traditional Chinese language model research community founded in late 2024. Starting with open-source LLaMA models, we've gradually built practical technologies specifically for Traditional Chinese. Our community members come from diverse professional backgrounds, united by our passion for model training, committed to promoting large language model training knowledge and advancing Taiwan's generative AI development through open collaboration.

👋 Welcome to join our Discord to collaborate and communicate with community members!
mascot.png
Twinkle AI Mascot
mission.md

我們的理念

社群旨在用實際行動推廣大語言模型的訓練知識,並希望透過開放與共享,推動臺灣在生成式 AI 領域的發展。

First Principles 從第一原理出發

在 LLM 這個快速變遷的領域,我們不滿足於使用第三方套件和 API。我們鼓勵群友重造輪子,深入理解每個環節的底層原理,掌握最真實的技術。

Cultivating Local Talent 培育本土人才

Twinkle AI 致力於培育台灣本土的 AI 人才,讓我們不會隨波逐流,避免成為只會呼叫工具的「人類 MCP」,而是真正掌握核心技術的開發者。

Beyond Existing Frameworks 超越既有框架

我們相信能做出比早期套件更好更完善的解決方案。透過重新思考和實作,不僅解決問題,更為台灣累積寶貴的研發能量。

Open Source & Sharing 開源與共享

已開源 Twinkle Eval 評測框架、繁中訓練集文本以及繁中大語言模型,透過開放合作推動台灣生成式 AI 生態系的發展。

Our Philosophy

Our community aims to promote knowledge of large language model training through practical actions, and hopes to advance Taiwan's development in the generative AI field through openness and sharing.

First Principles First Principles Approach

In the rapidly evolving field of LLM, we're not satisfied with just using third-party packages and APIs. We encourage community members to reinvent the wheel, deeply understand the underlying principles of each component, and master the most authentic technology.

Cultivating Local Talent Cultivating Local Talent

Twinkle AI is committed to nurturing Taiwan's local AI talent, ensuring we don't just follow trends or become "human MCPs" that only call tools, but become developers who truly master core technologies.

Beyond Existing Frameworks Beyond Existing Frameworks

We believe we can create better and more comprehensive solutions than early packages. Through rethinking and implementation, we not only solve problems but also accumulate valuable R&D capabilities for Taiwan.

Open Source & Sharing Open Source & Sharing

We've open-sourced Twinkle Eval evaluation framework, Traditional Chinese training datasets, and Traditional Chinese large language models, promoting Taiwan's generative AI ecosystem development through open collaboration.

projects.md

核心專案

Why reinvent the wheel

為什麼要重造輪子?

在 LLM 這個新興領域,每一天每一刻許多東西都在快速轉變。如果只是用第三方的套件、函式庫或 API,則會難以習得最底層的運作原理。我們鼓勵從第一原理思考,打造更適合繁中的解決方案。

Core Projects

Why reinvent the wheel

Why reinvent the wheel?

In the emerging field of LLM, many things are changing rapidly every day and every moment. If we only use third-party packages, libraries, or APIs, it will be difficult to learn the underlying principles. We encourage thinking from first principles and building solutions that are more suitable for Traditional Chinese.

models-and-datasets.yaml

模型與資料集

我們的研究成果已在 Hugging Face 上開源,包含推理模型、資料集和評測基準。

Models & Datasets

Our research results have been open-sourced on Hugging Face, including inference models, datasets, and evaluation benchmarks.

Formosa-1 Series

Formosa-1 Series

專注繁體中文指令遵循與邏輯推理的模型系列

Formosa-1 (F1) 推理模型集合,專為繁體中文指令遵循和邏輯推理而設計。這是我們從第一原理出發,針對繁中特性優化的旗艦模型系列。

旗艦模型 指令遵循 邏輯推理 繁中優化
Traditional Chinese Reasoning Datasets

Traditional Chinese Reasoning Datasets

繁體中文推理能力評測與訓練資料集

精心策劃的資料集合,用於評估和訓練各領域的繁體中文推理能力,涵蓋邏輯推理、常識推理等多個維度。

推理評測 多領域
tw-leetcode

tw-leetcode

繁體中文 LeetCode 高效解法資料集

針對 LeetCode 題目的繁體中文資料集,包含高效能程式解法(Beats 100%)、完整解題思路,以及時間與空間複雜度分析。採用「Top Concept → Step Implement → Complexity Analysis」結構,便於理解程式邏輯推理過程。

高效能解法 結構化思路 複雜度分析
Eval Logs

Eval Logs

Twinkle Eval 基準測試記錄

使用 Twinkle Eval 生成的基準測試日誌,記錄各種模型在每個提示上的輸出結果,提供透明的評測過程和結果追溯。

評測記錄 透明化 結果追溯
Formosa-1 Series

Formosa-1 Series

Flagship model series focused on Traditional Chinese instruction following and logical reasoning

Formosa-1 (F1) reasoning model collection, specifically designed for Traditional Chinese instruction following and logical reasoning. This is our flagship model series optimized for Traditional Chinese characteristics, built from first principles.

Flagship Model Instruction Following Logical Reasoning Traditional Chinese Optimized
Traditional Chinese Reasoning Datasets

Traditional Chinese Reasoning Datasets

Traditional Chinese reasoning evaluation and training datasets

Carefully curated datasets for evaluating and training Traditional Chinese reasoning capabilities across multiple domains, covering logical reasoning, commonsense reasoning, and other dimensions.

Reasoning Evaluation Multi-domain
tw-leetcode

tw-leetcode

Traditional Chinese LeetCode high-performance solutions dataset

Traditional Chinese dataset for LeetCode problems, featuring high-performance solutions (Beats 100%), complete problem-solving approaches, and time & space complexity analysis. Uses "Top Concept → Step Implement → Complexity Analysis" structure for better understanding of algorithmic reasoning processes.

High-Performance Solutions Structured Approach Complexity Analysis
Eval Logs

Eval Logs

Twinkle Eval benchmark testing records

Benchmark testing logs generated using Twinkle Eval, recording various model outputs for each prompt, providing transparent evaluation processes and result traceability.

Evaluation Records Transparency Result Tracing
quick-start.py

快速開始

# clone LLM Lab 實驗環境 git clone https://github.com/ai-twinkle/llm-lab.git
# 開始你的繁中 AI 研究之旅 🚀

立即加入我們的 Discord 社群,與其他開發者和研究者交流。獲取最新的模型更新、技術分享,以及參與開源項目的機會。

Quick Start

# clone LLM Lab research environment git clone https://github.com/ai-twinkle/llm-lab.git
# Start your Traditional Chinese AI research journey 🚀

Join our Discord community now to communicate with other developers and researchers. Get the latest model updates, technical sharing, and opportunities to participate in open source projects.

community.json

社群連結

Community Links