01.AI’s newest closed-source model said to outperform GPT-4 across six benchmarks

2024-05-07

Kai-Fu-Lee — Li Kai-Fu, founder of 01.AI. Source: @kaifulee / X

After a hiatus of six months, Li Kai-Fu once again graced the press conference of 01.AI, this time as CEO. His last appearance at a 01.AI press conference dates back to November 16, 2023, when 01.AI showcased its prowess by open-sourcing “Yi,” a bilingual large model.

Li’s presence typically signals a new chapter for 01.AI. After refining its open-source model for half a year, 01.AI has entered a fresh phase of product development and commercialization. Open-sourcing serves merely as an introductory strategy for word-of-mouth marketing. To activate the commercial flywheel, closure of sourcing becomes imperative. During the conference on May 13, 01.AI unveiled its maiden closed-source model, “Yi-Large,” boasting a trillion parameters, ostensibly outstripping GPT-4 across six benchmarks.

Yet, 01.AI’s commercial ambitions now hinge on the product front. The conference marked the official rollout of “Wanzhi,” an artificial intelligence-powered productivity application, dubbed by Li as the “AI-first version of [Microsoft] Office.”

Having already undergone overseas trials in September 2023, Wanzhi boasts tens of millions of global users. According to Li, 01.AI has forecasted a potential revenue of RMB 100 million (USD 13.8 million) from the product line this year.

01.AI follows a dual track strategy: Wanzhi application and Yi-Large API platform. Source: 01.AI

Presenting a dual-track strategy comprising open- and closed-source models across domestic and international B2B and B2C channels, 01.AI’s business roadmap mirrors Li’s managerial philosophy of embracing both technical faith and market implementation. He perceives the validation of technology-cost product-market fit (TC-PMF) as paramount for AI tech firms. To ascertain TC-PMF, 01.AI opted to first trial the “Wanzhi” API with a high-performance model abroad before its domestic launch nine months later.

Outperforming GPT-4 in six benchmarks, at one-third the cost

In 2023, 01.AI embarked on the frontier of large models through open sourcing. Since November 6, 2023, the company has open-sourced three variants of the Yi model: 6B, 9B, and 34B. At the recent press conference, 01.AI not only bolstered the capabilities of its open-source models but also introduced its first closed-source model, Yi-Large.

Yi-Large, engineered to rival GPT-4, boasts an impressive parameter scale. According to the official model ranking by Stanford’s AlpacaEval 2.0, Yi-Large’s English proficiency index, “LC Win Rate,” places it second only to GPT-4 Turbo. Moreover, in the “SuperCLUE” Chinese ability assessment, Yi-Large surpasses GPT-4 across six datasets, including multiple-choice questions (GPQA) and human alignment (AlignBench).

Simultaneously, 01.AI commenced training its inaugural mixture of experts (MoE) large model, “Yi-XLarge.” Despite being in its early training stages, Yi-XLarge has already outperformed Yi-Large in benchmarks such as MMLU, GPQA, HumanEval, and MATH, qualifying it to compete with renowned overseas models like Claude-3-Opus and GPT4-0409.

Under the Yi-1.5 version upgrade, 01.AI addressed deficiencies in mathematics and code across three models: 34B, 9B, and 6B. The fine-tuned Yi-1.5-6B/9B-Chat surpasses Llama-3-8B in mathematics ability evaluations and matches Mistral-8x22B-Instruct-v0.1 in code ability evaluations.

Benchmarking of Yi-Large model. Source: 01.AI

01.AI also rolled out six distinct, performance-oriented model APIs for Yi-Large:

Yi-Large API: Optimized for text generation and inference performance, ideal for intricate reasoning, prediction, and deep content creation scenarios.
Yi-Large-Turbo API: Balanced for high-precision inferences and text generation, catering to scenarios demanding top-tier quality inferences and text generation.
Yi-Medium API: Tailored for instruction-following capabilities, suitable for chat, conversation, translation, and routine scenarios.
Yi-Medium-200K API: Equipped to process 200,000 words of text at once, perfect for handling ultra-long content document scenarios.
Yi-Vision API: Boasting high-performance image understanding and analysis, serving image-based chat, analysis, and other scenarios.
Yi-Spark API: Emphasizing lightweight and ultra-fast response, ideal for lightweight mathematical analysis, code generation, text chat, and similar scenarios.

In terms of pricing, 01.AI’s Lan Yuchuan disclosed that the current rate for the Yi-Large API stands at RMB 20 (USD 2.7) per million tokens, less than one-third of GPT-4 Turbo’s price. Lan added that 01.AI might adopt a cloud-based approach for API tools and industry solutions in the future.

“AI-first version of [Microsoft] Office”

This year, 01.AI is aiming to rake in RMB 100 million in single-product revenue, which would equate to a product ROI of close to 1. After being online for nine months and amassing nearly tens of millions of users, this achievement would further solidify the company’s development to date.

Before introducing 01.AI’s latest application, Li described a series of overseas achievements, signaling the preliminary validation of product-market fit for the product abroad. Building on this momentum, 01.AI launched its inaugural application product, “Wanzhi,” domestically. Positioned as a productivity tool, Li coined it as the “AI-first version of [Microsoft] Office.”

Functions of Wanzhi AI productivity tool. Source: 01.AI

Drawing from the insights gleaned from overseas validation, Cao Dapeng, a product manager at 01.AI, recognized a shift in user preferences. Existing tools that initiated workflows with blank documents were no longer meeting users’ needs. Instead, users sought office products that seamlessly integrated character and graphical user interfaces (CUI and GUI).

Wanzhi not only offers basic conversational search capabilities but also boasts a multimodal understanding capability that generates results in various chart formats. Setting it apart from competitors like WPS and Microsoft Copilot, Wanzhi expanded its reach through WeChat’s mini-program platform, enabling collaboration across multiple terminals.

This feature empowers users to efficiently process slide decks on their mobile phones during fragmented times like commuting while synchronizing work progress with other devices. With Wanzhi, 01.AI aims to reshape the landscape of office productivity tools, offering a seamless blend of chat-based interaction and graphical interface functionality.

Using tomorrow’s tech to build today’s products

Li’s reflection on 01.AI’s journey underscores its achievement in catching up with the most advanced models in the US. However, he emphasizes the need for large model developers to transition into a “long-distance running mode.” This shift in perspective gave rise to TC-PMF, a concept borne out of his contemplation.

When products are in the 0–1 stage, companies prioritize customer acquisition and increasing stickiness. Yet, as they progress into the 1–100 stage, achieving scalability demands a careful balance between the technical path and inference cost.

Huang Wenhao, 01.AI’s model training manager, highlights scaling law as a guiding principle toward artificial general intelligence (AGI). This necessitates optimizing the model’s efficiency under given computing conditions and enhancing the quality of training data. Consequently, there’s a heightened need for a talented team capable of integrating algorithms, infrastructure, and engineering.

At the product level, Li acknowledges the rapid technological advancements shaping today’s landscape. Unlike the era of Douyin, where progress was slower, large models are now rapidly evolving based on scaling law. GPT-4’s swift rewriting of applications based on GPT-3.5 underscores the importance of considering what tomorrow’s tech will be like while developing products in the present day.

“When building products, consider tomorrow’s technology instead of today’s,” Li said.

This article was written by Zhou Xinyu in Chinese and was originally published by 36Kr.

01.AI’s newest closed-source model said to outperform GPT-4 across six benchmarks

Outperforming GPT-4 in six benchmarks, at one-third the cost

“AI-first version of [Microsoft] Office”

Using tomorrow’s tech to build today’s products

No strings attached: How LiberLive turned rejection into a runaway hit

Oasa’s robotic mower was clever, but not enough to keep the company alive

Not the next Uniqlo: Inside Bananain’s plan to stay meaningful, not massive

Related Read

Hong Kong tech stocks just hit a wall. Is the AI rally already over?

Baidu lowers the cost of AI with Ernie 4.5 and X1

“You just cannot stop investing,” says Baidu’s Robin Li as AI costs plummet

DeepSeek hit 33.7 million users in January – but is a crackdown imminent?

Forget the price wars—MiniMax goes open-source to rewrite the AI playbook

DeepSeek’s R1 sparks global AI upheaval with low-cost brilliance

ABOUT US