comScore Tracking
site logo
search_icon

Ad

Alibaba’s Qwen3.7-Max AI Model Ranks Fourth Globally in Coding Benchmarks

Alibaba’s Qwen3.7-Max AI Model Ranks Fourth Globally in Coding Benchmarks

author-img
|
Updated on: 28-May-2026 05:00 PM
total-views-icon

9,963 views

share-icon
youtube-icon

Follow Us:

insta-icon
total-views-icon

9,963 views

Alibaba has launched its latest AI model, Qwen3.7-Max, which the company says has outperformed several competitors from OpenAI and Google in coding benchmarks. According to Alibaba, Qwen3.7-Max achieved fourth place worldwide on the Code Arena rankings, scoring 1,541 points. Code Arena is a benchmark that evaluates how well AI models can independently complete coding tasks. This ranking places Qwen3.7-Max ahead of some versions of ChatGPT and Gemini, with only Anthropic’s Claude family of models scoring higher.

Key Highlights

  • Alibaba's Qwen3.7-Max ranked fourth globally on Code Arena with a score of 1,541.
  • Model outperformed some versions of ChatGPT and Gemini in coding benchmarks.
  • Qwen3.7-Max can run autonomously for up to 35 hours and make over 1,000 tool calls.

Qwen3.7-Max’s Technical Capabilities

Qwen3.7-Max is designed for agent-based tasks, which means it can handle complex workflows without constant human input. Unlike standard chatbots that answer single questions, this model can manage long, multi-step processes. Alibaba reports that Qwen3.7-Max can act as a coding agent, building front-end prototypes, managing large software projects across multiple files, and automating office tasks using external tools. The company claims the AI can operate autonomously for up to 35 hours and make over 1,000 tool calls in a single session.

To demonstrate its capabilities, Alibaba researchers assigned Qwen3.7-Max the task of optimizing code for one of their AI chips. The model reportedly worked for 35 hours continuously, running 432 kernel tests and making more than 1,100 tool calls. It compiled, measured, and rewrote code independently, achieving a tenfold performance improvement over the original code. Alibaba states that Qwen3.7-Max had not previously encountered that chip architecture during its training.

Market Position and Strategic Shift

Alibaba’s Qwen3.7-Max arrives at a time when US companies like OpenAI, Google, and Anthropic have dominated advanced coding AI. Chinese firms are now increasing their efforts to compete, focusing on autonomous coding agents that function more like software engineers. Alibaba says Qwen3.7-Max supports interfaces compatible with OpenAI and Anthropic, and it can work with tools such as Claude Code, OpenClaw, and Qwen Code.

This launch also marks a strategic change for Alibaba. While earlier Qwen models were open source, Qwen3.7-Max is proprietary and available only through Alibaba Cloud’s Model Studio API. Beyond coding, Alibaba claims the model can monitor AI training systems, detect suspicious activity during software engineering tests, and guide robots in physical spaces using paired navigation systems. The company also reports that Qwen3.7-Max performed well on several reasoning and coding benchmarks, coming close to Anthropic’s Claude Opus 4.6 Max in multiple tests. Most benchmark results are self-reported by Alibaba, and the company plans to release a detailed technical report later.

Explore Mobile Brands

Xiaomi
Xiaomi
OPPO
OPPO
Vivo
Vivo
Realme
Realme
Apple
Apple
OnePlus
OnePlus

Ad