Thursday, 2024 November 21

01.AI’s newest closed-source model said to outperform GPT-4 across six benchmarks

After a hiatus of six months, Li Kai-Fu once again graced the press conference of 01.AI, this time as CEO. His last appearance at a 01.AI press conference dates back to November 16, 2023, when 01.AI showcased its prowess by open-sourcing “Yi,” a bilingual large model.

Li’s presence typically signals a new chapter for 01.AI. After refining its open-source model for half a year, 01.AI has entered a fresh phase of product development and commercialization. Open-sourcing serves merely as an introductory strategy for word-of-mouth marketing. To activate the commercial flywheel, closure of sourcing becomes imperative. During the conference on May 13, 01.AI unveiled its maiden closed-source model, “Yi-Large,” boasting a trillion parameters, ostensibly outstripping GPT-4 across six benchmarks.

Yet, 01.AI’s commercial ambitions now hinge on the product front. The conference marked the official rollout of “Wanzhi,” an artificial intelligence-powered productivity application, dubbed by Li as the “AI-first version of [Microsoft] Office.”

Having already undergone overseas trials in September 2023, Wanzhi boasts tens of millions of global users. According to Li, 01.AI has forecasted a potential revenue of RMB 100 million (USD 13.8 million) from the product line this year.

01.AI follows a dual track strategy: Wanzhi application and Yi-Large API platform. Source: 01.AI

Presenting a dual-track strategy comprising open- and closed-source models across domestic and international B2B and B2C channels, 01.AI’s business roadmap mirrors Li’s managerial philosophy of embracing both technical faith and market implementation. He perceives the validation of technology-cost product-market fit (TC-PMF) as paramount for AI tech firms. To ascertain TC-PMF, 01.AI opted to first trial the “Wanzhi” API with a high-performance model abroad before its domestic launch nine months later.

Outperforming GPT-4 in six benchmarks, at one-third the cost

In 2023, 01.AI embarked on the frontier of large models through open sourcing. Since November 6, 2023, the company has open-sourced three variants of the Yi model: 6B, 9B, and 34B. At the recent press conference, 01.AI not only bolstered the capabilities of its open-source models but also introduced its first closed-source model, Yi-Large.

Yi-Large, engineered to rival GPT-4, boasts an impressive parameter scale. According to the official model ranking by Stanford’s AlpacaEval 2.0, Yi-Large’s English proficiency index, “LC Win Rate,” places it second only to GPT-4 Turbo. Moreover, in the “SuperCLUE” Chinese ability assessment, Yi-Large surpasses GPT-4 across six datasets, including multiple-choice questions (GPQA) and human alignment (AlignBench).

Simultaneously, 01.AI commenced training its inaugural mixture of experts (MoE) large model, “Yi-XLarge.” Despite being in its early training stages, Yi-XLarge has already outperformed Yi-Large in benchmarks such as MMLU, GPQA, HumanEval, and MATH, qualifying it to compete with renowned overseas models like Claude-3-Opus and GPT4-0409.

Under the Yi-1.5 version upgrade, 01.AI addressed deficiencies in mathematics and code across three models: 34B, 9B, and 6B. The fine-tuned Yi-1.5-6B/9B-Chat surpasses Llama-3-8B in mathematics ability evaluations and matches Mistral-8x22B-Instruct-v0.1 in code ability evaluations.

Benchmarking of Yi-Large model. Source: 01.AI

01.AI also rolled out six distinct, performance-oriented model APIs for Yi-Large:

  • Yi-Large API: Optimized for text generation and inference performance, ideal for intricate reasoning, prediction, and deep content creation scenarios.
  • Yi-Large-Turbo API: Balanced for high-precision inferences and text generation, catering to scenarios demanding top-tier quality inferences and text generation.
  • Yi-Medium API: Tailored for instruction-following capabilities, suitable for chat, conversation, translation, and routine scenarios.
  • Yi-Medium-200K API: Equipped to process 200,000 words of text at once, perfect for handling ultra-long content document scenarios.
  • Yi-Vision API: Boasting high-performance image understanding and analysis, serving image-based chat, analysis, and other scenarios.
  • Yi-Spark API: Emphasizing lightweight and ultra-fast response, ideal for lightweight mathematical analysis, code generation, text chat, and similar scenarios.

In terms of pricing, 01.AI’s Lan Yuchuan disclosed that the current rate for the Yi-Large API stands at RMB 20 (USD 2.7) per million tokens, less than one-third of GPT-4 Turbo’s price. Lan added that 01.AI might adopt a cloud-based approach for API tools and industry solutions in the future.

“AI-first version of [Microsoft] Office”

This year, 01.AI is aiming to rake in RMB 100 million in single-product revenue, which would equate to a product ROI of close to 1. After being online for nine months and amassing nearly tens of millions of users, this achievement would further solidify the company’s development to date.

Before introducing 01.AI’s latest application, Li described a series of overseas achievements, signaling the preliminary validation of product-market fit for the product abroad. Building on this momentum, 01.AI launched its inaugural application product, “Wanzhi,” domestically. Positioned as a productivity tool, Li coined it as the “AI-first version of [Microsoft] Office.”

Functions of Wanzhi AI productivity tool. Source: 01.AI

Drawing from the insights gleaned from overseas validation, Cao Dapeng, a product manager at 01.AI, recognized a shift in user preferences. Existing tools that initiated workflows with blank documents were no longer meeting users’ needs. Instead, users sought office products that seamlessly integrated character and graphical user interfaces (CUI and GUI).

Wanzhi not only offers basic conversational search capabilities but also boasts a multimodal understanding capability that generates results in various chart formats. Setting it apart from competitors like WPS and Microsoft Copilot, Wanzhi expanded its reach through WeChat’s mini-program platform, enabling collaboration across multiple terminals.

This feature empowers users to efficiently process slide decks on their mobile phones during fragmented times like commuting while synchronizing work progress with other devices. With Wanzhi, 01.AI aims to reshape the landscape of office productivity tools, offering a seamless blend of chat-based interaction and graphical interface functionality.

Using tomorrow’s tech to build today’s products

Li’s reflection on 01.AI’s journey underscores its achievement in catching up with the most advanced models in the US. However, he emphasizes the need for large model developers to transition into a “long-distance running mode.” This shift in perspective gave rise to TC-PMF, a concept borne out of his contemplation.

When products are in the 0–1 stage, companies prioritize customer acquisition and increasing stickiness. Yet, as they progress into the 1–100 stage, achieving scalability demands a careful balance between the technical path and inference cost.

Huang Wenhao, 01.AI’s model training manager, highlights scaling law as a guiding principle toward artificial general intelligence (AGI). This necessitates optimizing the model’s efficiency under given computing conditions and enhancing the quality of training data. Consequently, there’s a heightened need for a talented team capable of integrating algorithms, infrastructure, and engineering.

At the product level, Li acknowledges the rapid technological advancements shaping today’s landscape. Unlike the era of Douyin, where progress was slower, large models are now rapidly evolving based on scaling law. GPT-4’s swift rewriting of applications based on GPT-3.5 underscores the importance of considering what tomorrow’s tech will be like while developing products in the present day.

“When building products, consider tomorrow’s technology instead of today’s,” Li said.

KrASIA Connection
KrASIA Connection
KrASIA Connection features translated and adapted high-quality insights published on 36Kr.com, the largest and most influential technology portal in Chinese language with over 150 million readers across the globe.
MORE FROM AUTHOR

Related Read