Release Notes

AI 2.0.0

New and Optimized Features

NPU Support

By introducing the NPU Operator, NPU hardware management is supported, simplifying the setup for using NPUs and enhancing hardware acceleration.

Queueing & Admission Control

By introducing the Alauda Build of Kueue, queue management and admission control are implemented, optimizing task scheduling and resource allocation.

Model Registry

The integration of Kubeflow Model Registry enhances model management and version control, simplifying the process of model registration and storage.

Distributed Inference

With the addition of llm-d, distributed inference is now supported, improving the performance and resource efficiency of large-scale inference tasks.

Leader-Worker Set Support

Through the introduction of the Alauda Build of LeaderWorkerSet, distributed training task management is supported, simplifying task distribution and coordination in a Leader-Worker model.

AI Gateway

The new Alauda build of Envoy AI Gateway optimizes traffic management and security, providing enhanced AI service proxy capabilities.

Trainer Orchestration

By supporting Kubeflow Trainer v2, task scheduling and management for model training have been enhanced, enabling more flexible and efficient training workflows.

Pipelines Orchestration

Integration with Kubeflow Pipeline allows for more efficient pipeline orchestration and task management, improving automation across workflows.

Vector Storage

The introduction of Milvus provides an efficient vector storage solution, supporting large-scale vector data storage and fast retrieval.

NVIDIA DRA

By introducing the NVIDIA DRA Driver for GPUs, GPU resource management and scheduling have been optimized, increasing GPU utilization and performance.

Node Feature Discovery

The Alauda Build of Node Feature Discovery enables automatic discovery of hardware features and node labeling, improving flexibility in node resource management.

Llama Stack Runtime

The introduction of Llama Stack enhances AI agent capabilities by providing an efficient runtime environment for supporting distributed AI tasks.

Generative AI

By introducing Kserve's Generative AI section, support for generative AI has been enhanced, optimizing model deployment and inference, especially for generative AI applications.

Low-Code Builder Integration

By updating the Dify version and providing deployable Charts, the low-code application building process has been simplified, further enhancing AI application development efficiency.

Deprecated Features

UI Fine-Tuning & Pretraining Deprecation

Due to the lack of general applicability, complex horizontal scaling, and the fact that this is not a mainstream method in the industry for model training and fine-tuning, this feature is deprecated. It is recommended to use notebook-based model fine-tuning and training instead.

Secret Manage Deprecation

The Secret Manage feature is deprecated due to the lack of a valid use case, as manual GitLab integration is no longer required and no longer applicable.

Fixed Issues

  • When updating the inference service resource yaml through the page, the volumeMount field is missing, which can cause the inference service to fail to start properly
  • In older versions, GraphQL queries (POST by default) were incorrectly intercepted by the gateway layer and checked for create permission. In the new version, requests sent to the /api/graphql interface are correctly treated as get read permissions by the RBAC interceptor, ensuring that users with read-only roles can read and access page content containing GraphQL data streams without problems.

Known Issues

  • After deleting a model, the list page fails to reflect the deletion result immediately, and the deleted model still briefly exists in the list. Temporary solution, manually refresh the page.
  • Modifying library_name in Gitlab by directly editing the readme file does not synchronize the model type change on the page.
    Temporary solution: Use UI operation to modify the library_name to avoid direct operation in Gitlab.