Alibaba Cloud open sources video foundation models

The Chinese tech giant releases Tongyi Wanxiang 2.1 family of video foundation models as open source, including a smaller version enabling video creation on standard laptops

Alibaba Cloud has open sourced its family of video foundation models to provide businesses and researchers with access to video creation capabilities.

The four models in the Tongyi Wanxiang (Wan) 2.1 family, comprising both 14 billion and 1.3 billion parameter versions, are designed to generate high-quality videos from text and image inputs. The models are available for download on Alibaba Cloud’s AI model community, Model Scope, as well as Hugging Face.

According to Alibaba Cloud, Wan 2.1 distinguishes itself as the first video-generation model to support text effects in both Chinese and English. Its proficiency in generating realistic visuals is attributed to its ability to handle complex movements, enhance pixel quality, adhere to physical principles and optimise the precision of instruction execution.

These capabilities have propelled Wan 2.1 to the top of the VBench leaderboard, a benchmark suite for video generative models. It is also the only open-source model among the top five on Hugging Face’s VBench leaderboard.

The different models cater to varying needs and computational resources. The 14B model excels in creating high-quality visuals with complex motion dynamics, while the 1.3B model, which balances generation quality with computational efficiency, will allow users with standard laptops to generate a five-second 480p video in about four minutes.

Training video foundation models requires a vast amount of computing resources and high-quality training data. Open source can lower the barrier for more businesses to leverage AI, enabling them to create high-quality visual content tailored to their needs in a cost-effective way.  

Besides Wan 2.1, Alibaba Cloud has also open sourced its Qwen foundation models, which have ranked highly on the HuggingFace Open LLM leaderboards with performance comparable to leading global models. Currently, over 100,000 derivative models built upon the Qwen family exist on Hugging Face, making it one of the largest AI model families worldwide.

The company also provides the AI Model Studio that lets large enterprises access its foundation models and model training tools to speed up deployment of large language models in a controlled environment.

With Model studio, they can monitor and identify risky content, and filter or block undesirable information based on responsible AI principles. They can also train foundation models by creating, labelling and managing training datasets, customise model training with adjustable parameters, as well as evaluate and deploy foundation models easily.

Earlier this week, Alibaba Cloud said it will invest RMB 380 billion (US$53bn) in its cloud computing and artificial intelligence (AI) infrastructure over the next three years, surpassing its total spending on cloud and AI in the past decade.

Alibaba’s cloud intelligence unit reported 11% year-over-year revenue growth in the latest quarter, excluding consolidated subsidiaries. Its AI-related product revenue grew in the triple-digit range for the sixth consecutive quarter, thanks to rising demand for its AI hosting and related offerings.

Read more about AI in APAC

Read more on Artificial intelligence, automation and robotics