跳转至

弹性部署API文档

新增接口: 设置调度黑名单获取地区GPU库存

使用弹性部署API需先认证企业。了解弹性部署请参考文档

API服务端HOST地址为:https://api.autodl.com

鉴权

token获取位置: 控制台 -> 设置 -> 开发者Token

headers = {"Authorization": "token"}

获取镜像

镜像为在AutoDL中创建并保存的自定义镜像,创建和保存可通过autodl.com网页完成。暂不支持从外部导入镜像。使用平台提供的基础公共镜像请看文末附录

请求

POST /api/v1/dev/image/private/list

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
page_index Int 页码
page_size Int 每页条目数
offset Int 查询的起始偏移量

样例:

{
    "page_index": 1,
    "page_size": 10,
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data -> list List<Response对象>

Response对象参数:

参数 数据类型 备注
id Int 镜像ID
image_name String 镜像名称
image_uuid String 镜像的UUID

样例:

{
    "code": "Success",
    "msg": ""
    "data": {
        "list": [
            {
                "id": 111,
                "created_at": "2022-01-20T18:34:08+08:00",
                "updated_at": "2022-01-20T18:34:08+08:00",
                "image_uuid": "image-db8346e037",
                "name": "image name",
                "status": "finished",
            }
        ],
        "page_index": 1,
        "page_size": 10,
        "max_page": 1,
        "offset": 0,
    },
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/image/private/list"
body = {
    "page_index": 1,
    "page_size": 10,
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

创建部署

请求

POST /api/v1/dev/deployment

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
name String 部署名称
deployment_type String 部署类型。支持ReplicaSet、Job、Container
replica_num Int ReplicaSet、Job必填 创建容器的副本数量,ReplicaSet、Job必填
parallelism_num Int Job必填 Job类型部署同时在运行的容器容量
reuse_container Bool 是否复用已经停止的容器,可显著提升创建容器的速度
container_template Container Template对象

Container Template对象:

参数 数据类型 是否必须 备注
region_sign String 容器可调度的地区。地区参数值参考文档最下方附录
cuda_v Int 将选择GPU驱动支持该CUDA版本的主机进行调度,可选值及更多说明请参考文档最下方
gpu_name_set List<String> 可调度的GPU型号。参考网页创建弹性部署时显示的GPU型号名称
gpu_num Int 创建容器所需GPU数量
memory_size_from Int 可调度的容器内存大小范围。单位:GB
memory_size_to Int 同上
cpu_num_from Int 可调度的CPU核心数量范围。单位:1vCPU
cpu_num_to Int 同上
price_from Int 可调度的价格范围。单位:元 * 1000,如0.1元填写100
price_to Int 同上
image_uuid String 私有镜像UUID或平台公共基础镜像的UUID(参考文末附录)
cmd String 启动容器命令

样例:

{
    "name": "api自动创建", 
    "deployment_type": "ReplicaSet", 
    "replica_num": 2, 
    "reuse_container": true,
    "container_template": {
        "region_sign": "suqianDC1", 
        "gpu_name_set": [
            "RTX A5000"
        ], 
        "cuda_v": 113,
        "gpu_num": 1, 
        "cpu_num_from": 1, 
        "cpu_num_to": 100, 
        "memory_size_from": 1,
        "memory_size_to": 256, 
        "cmd": "sleep 100",
        "price_from": 100,  # 基准价格:0.1元/小时
        "price_to": 9000, # 基准价格:9元/小时
        "image_uuid": "image-db8346e037"
    }
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data Response对象

Response对象参数:

参数 数据类型 备注
deployment_uuid String 部署的UUID

样例:

{
    "code": "Success",
    "msg": "",
    "data": {
        "deployment_uuid": "833f1cd5a764fa3"
    }
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment"

# 创建ReplicaSet类型部署
body = {
    "name": "api自动创建",
    "deployment_type": "ReplicaSet",
    "replica_num": 2,
    "reuse_container": True,
    "container_template": {
        "region_sign": "suqianDC1",
        "gpu_name_set": ["RTX A5000"],
        "gpu_num": 1,
        "cuda_v": 113,
        "cpu_num_from": 1,
        "cpu_num_to": 100,
        "memory_size_from": 1,
        "memory_size_to": 256,
        "cmd": "sleep 100",
        "price_from": 10,
        "price_to": 9000,
        "image_uuid": "image-db8346e037",
    },
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

# 附:
# 如果创建Job类型部署,Body为:
{
    "name": "api自动创建",
    "deployment_type": "Job",
    "replica_num": 4,
    "parallelism_num": 1,
    "reuse_container": True,
    "container_template": {
        "region_sign": "suqianDC1",
        "gpu_name_set": ["RTX A5000"],
        "gpu_num": 1,
        "cuda_v": 113,
        "cpu_num_from": 1,
        "cpu_num_to": 100,
        "memory_size_from": 1,
        "memory_size_to": 256,
        "cmd": "sleep 10",
        "price_from": 10,
        "price_to": 9000,
        "image_uuid": "image-db8346e037",
    },
}

# 如果创建Container类型部署,Body为:
{
    "name": "api自动创建",
    "deployment_type": "Container",
    "reuse_container": True,
    "container_template": {
        "region_sign": "neimengDC1",
        "gpu_name_set": ["RTX A5000"],
        "gpu_num": 1,
        "cuda_v": 113,
        "cpu_num_from": 1,
        "cpu_num_to": 100,
        "memory_size_from": 1,
        "memory_size_to": 256,
        "cmd": "sleep 100",
        "price_from": 10,
        "price_to": 9000,
        "image_uuid": "image-db8346e037",
    },
}

获取部署列表

POST /api/v1/dev/deployment/list

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
page_index Int 页码
page_size Int 每页条目数
deployment_uuid String 选填,可根据部署的UUID筛选

样例:

{
    "page_index": 1,
    "page_size": 10,
}

响应

字段含义同创建部署的传参字段含义

样例:

{
    "code": "Success",
    "data": {
        "list": [
            {
                "id": 214,
                "uid": 58,
                "uuid": "53a677bb3e281b8",
                "name": "xxxx",
                "deployment_type": "Container",
                "status": "stopped",
                "replica_num": 1,
                "parallelism_num": 1,
                "reuse_container": true,
                "starting_num": 0,
                "running_num": 0,
                "finished_num": 2,
                "image_uuid": "image-db8346e037",
                "template": {
                    "region_sign": "xxxxx",
                    "region_sign_list": [
                        "xxxxx",
                        "xxxxx"
                    ],
                    "gpu_name_set": [
                        "Tesla V100-SXM2-32GB"
                    ],
                    "gpu_num": 1,
                    "image_uuid": "image-db8346e037",
                    "image_name": "xxxx",
                    "cmd": "sleep 100",
                    "memory_size_from": 1073741824,
                    "memory_size_to": 274877906944,
                    "cpu_num_from": 1,
                    "cpu_num_to": 100,
                    "price_from": 10,
                    "price_to": 9000,
                    "cuda_v": 118
                },
                "price_estimates": 0,
                "created_at": "2023-01-05T20:34:07+08:00",
                "updated_at": "2023-01-05T20:34:07+08:00",
                "stopped_at": null
            }
        ],
        "page_index": 1,
        "page_size": 10,
        "offset": 0,
        "max_page": 1,
        "result_total": 3,
        "page": 1
    },
    "msg": ""
}

查询容器事件

可以通过对请求中的offset参数进行设置,轮询该接口获取最新的容器事件

请求

POST /api/v1/dev/deployment/container/event/list

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署的UUID
deployment_container_uuid String 容器的UUID,可选
page_index Int 页码
page_size Int 每页条目数
offset Int 查询的起始偏移量

样例:

{
    "deployment_uuid": "da497aea1eb8343", 
    "deployment_container_uuid": "", 
    "page_index": 1, 
    "page_size": 10,
    "offset": 0
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data -> list list<Response对象>

Response对象参数:

参数 数据类型 备注
deployment_container_uuid String 容器的UUID
status String 容器的状态类型
created_at String 状态发生时间

样例:

{
    "code": "Success",
    "data": {
        "list": [
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "shutdown",
                "created_at": "2022-12-13T16:42:45+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "shutting_down",
                "created_at": "2022-12-13T16:42:40+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "running",
                "created_at": "2022-12-13T16:34:57+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "oss_merged",
                "created_at": "2022-12-13T16:34:55+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "starting",
                "created_at": "2022-12-13T16:34:55+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "created",
                "created_at": "2022-12-13T16:34:54+08:00"
            },
            {
                "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
                "status": "creating",
                "created_at": "2022-12-13T16:34:47+08:00"
            }
        ],
        "page_index": 1,
        "page_size": 10,
        "offset": 0,
        "max_page": 1,
    },
    "msg": ""
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/container/event/list"
body = {
    "deployment_uuid": "424446e02893b5f",
    "deployment_container_uuid": "",
    "page_index": 0,
    "page_size": 10,
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

查询容器

如果您需要在容器内部获取到容器的UUID,可以通过变量变量AutoDLContainerUUID的值获取。

请求

POST /api/v1/dev/deployment/container/list

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署UUID
container_uuid String 筛选container uuid
date_from String 筛选容器创建时间范围
date_to String 筛选容器创建时间范围
gpu_name String 筛选GPU型号
cpu_num_from Int 筛选容器CPU核心数量范围
cpu_num_to Int 筛选容器CPU核心数量范围
memory_size_from Int 筛选容器内存大小范围
memory_size_to Int 筛选容器内存大小范围
price_from Float 筛选容器基准价范围
price_to Float 筛选容器基准价范围
released bool 是否查询已经释放的实例
page_index Int 缺省值0
page_size Int 缺省值10
offset Int 查询的起始偏移量

样例:

{
    "deployment_uuid": "da497aea1eb8343", 
    "page_index": 1, 
    "page_size": 10
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data -> list list<Response对象>

Response对象参数:

参数 数据类型 备注
uuid String 容器的UUID
deployment_uuid String 部署的UUID
machine_id String 主机UUID
status String 容器的状态
gpu_name String GPU型号
gpu_num Int GPU数量
cpu_num Int CPU数量
memory_size Int 内存大小,单位byte
image_uuid String 镜像UUID
price Float 基准价格,单位:元*1000
info Info对象
started_at String 开始运行时间
stopped_at String 停止时间
created_at String 创建时间
updated_at String 更新时间

Info对象:

参数 数据类型 备注
ssh_command String SSH登录指令
root_password String SSH密码
service_url String 自定义服务地址
proxy_host String (废弃,请使用service_url)自定义服务HOST地址
custom_port Int (废弃,请使用service_url)自定义服务端口号

样例:

{
    "code": "Success", 
    "msg": "",
    "data": {
        "list": [
            {
                "id": 195, 
                "uuid": "53a677bb3e281b8-f94411a60c-63c24009",
                "machine_id": "f94411a60c", 
                "deployment_uuid": "da497aea1eb8343", 
                "status": "running", 
                "gpu_name": "TITAN Xp", 
                "gpu_num": 1, 
                "cpu_num": 4, 
                "memory_size": 2147483648, 
                "image_uuid": "image-db8346e037", 
                "price": 1881, 
                "info": {
                    "ssh_command": "ssh -p 21305 root@region-1.autodl.com",
                    "root_password": "xxxxxxxxxx", 
                    "service_url": "https://region-1.autodl.com:21294", 
                    "proxy_host": "region-1.autodl.com", 
                    "custom_port": 21294,
                }, 
                "started_at": "2022-12-13T16:43:03+08:00", 
                "stopped_at": null, 
                "created_at": "2022-12-13T16:42:50+08:00", 
                "updated_at": "2022-12-13T16:43:03+08:00"
            }
        ], 
        "page_index": 1, 
        "page_size": 10, 
        "max_page": 1, 
    }, 
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/container/list"
body = {
    "deployment_uuid": "424446e02893b5f",
    "container_uuid": "",
    "date_from": "",
    "date_to": "",
    "gpu_name": "",
    "cpu_num_from": 0,
    "cpu_num_to": 0,
    "memory_size_from": 0,
    "memory_size_to": 0,
    "price_from": 0,
    "price_to": 0,
    "released": False,

    "page_index": 1,
    "page_size": 10,
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

停止某容器

除了可以设置副本数量由系统自动伸缩,管理容器生命周期外,该接口支持支持停止某具体容器。如果您希望停止某容器后不再自动启动新容器维持副本数量,可以通过传入decrease_one_replica_num=true完成,在停止容器的同时将replica num副本数量减少1。注意decrease_one_replica_num参数只对ReplicaSet类型部署有效

请求

PUT /api/v1/dev/deployment/container/stop

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_container_uuid String 部署的容器uuid
decrease_one_replica_num Boolean 对于ReplicaSet类型的部署,是否同时将replica num副本数减少1个

样例:

{
     "deployment_container_uuid": "da497aea1eb8343-f94411a60c-a394fb30",
     "decrease_one_replica_num": false
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/container/stop"
body = {
    "deployment_container_uuid": "da497aea1eb8343-f94411a60c-ec630659",
    "decrease_one_replica_num": False
}
response = requests.put(url, json=body, headers=headers)
print(response.content.decode())

设置副本数量

请求

PUT /api/v1/dev/deployment/replica_num

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署uuid
replica_num Int 副本数量。仅支持ReplicaSet的部署类型

样例:

{
    "deployment_uuid": "xxx",
    "replica_num": 10
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/replica_num"
body = {
    "deployment_uuid": "5be3045703152b9",
    "replica_num": 16
}
response = requests.put(url, json=body, headers=headers)
print(response.content.decode())

停止部署

请求

PUT /api/v1/dev/deployment/operate

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署uuid
operate String 操作类型。目前只能为:"stop"

样例:

{
    "deployment_uuid": "xxx",
    "operate": "stop"
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/operate"
body = {
    "deployment_uuid": "5be3045703152b9",
    "operate": "stop"
}
response = requests.put(url, json=body, headers=headers)
print(response.content.decode())

删除部署

如果部署未停止直接执行删除操作,那么系统将会停止和删除部署

请求

DELETE /api/v1/dev/deployment

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_uuid String 部署uuid

样例:

{
    "deployment_uuid": "xxx"
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment"
body = {
    "deployment_uuid": "5be3045703152b9"
}
response = requests.delete(url, json=body, headers=headers)
print(response.content.decode())

设置调度黑名单

如果在调度和使用容器的过程中发现某个容器出现未知异常,那么您可以将此容器所在主机设置为禁止调度状态(该禁止状态在24小时后自动解除),设置后在接下来24小时内将不会在该主机上调度任何您的部署

请求

POST /api/v1/dev/deployment/blacklist

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
deployment_container_uuid String 容器uuid
comment String 备注信息

样例:

{
    "deployment_container_uuid": "xxx",
    "comment": "开机缓慢,禁止在该主机上调度容器"
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data

样例:

{
    "code": "Success",
    "msg": "",
    "data": null
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/deployment/blacklist"
body = {
    "deployment_container_uuid": "da497aea1eb8343-f94411a60c-1502e6e2",
    "comment": "开机缓慢,禁止在该主机上调度容器"
}
response = requests.delete(url, json=body, headers=headers)
print(response.content.decode())

获取弹性部署GPU库存

请求

POST /api/v1/dev/machine/region/gpu_stock

Body中放置请求参数,参数详情如下:

参数 数据类型 是否必须 备注
region_sign String 见附录中的不同地区的标识码
cuda_v Int 筛选GPU驱动支持该CUDA版本的主机,可选值及更多说明请参考文档最下方

样例:

{
    "region_sign": "westDC2",
    "cuda_v": 117
}

响应

响应参数:

参数 数据类型 备注
code String 响应代码,成功时为Success
msg String 错误信息,成功时为空
data -> list List

Response对象参数:

参数 数据类型 备注
GPU型号 库存对象

库存对象参数:

参数 数据类型 备注
idle_gpu_num Int 空闲数量
total_gpu_num Int 总数量

样例:

{
    "code": "Success",
    "msg": "",
    "data": [
        {
            "RTX 4090": {
                "idle_gpu_num": 215,
                "total_gpu_num": 2285
            }
        },
        {
            "RTX 3080 Ti": {
                "idle_gpu_num": 20,
                "total_gpu_num": 392
            }
        },
        {
            "RTX A4000": {
                "idle_gpu_num": 6,
                "total_gpu_num": 24
            }
        }
    ]
}
import requests
headers = {
    "Authorization": "您的token",
    "Content-Type": "application/json"
}
url = "https://api.autodl.com/api/v1/dev/machine/region/gpu_stock"
body = {
    "region_sign": "westDC2",
    "cuda_v": 117
}
response = requests.post(url, json=body, headers=headers)
print(response.content.decode())

附录

  1. 创建部署时的region_sign参数值
地区 region_sign值
西北企业区(推荐) westDC2
西北B区 westDC3
北京A区 beijingDC1
北京B区 beijingDC2
北京C区 beijingDC4
华南A区(原北京C区) beijingDC3
内蒙A区 neimengDC1
佛山区 foshanDC1
重庆A区 chongqingDC1
  1. 公共基础镜像UUID
镜像UUID 框架 镜像
base-image-12be412037 PyTorch cuda11.1-cudnn8-devel-ubuntu18.04-py38-torch1.9.0
base-image-u9r24vthlk PyTorch cuda11.3-cudnn8-devel-ubuntu20.04-py38-torch1.10.0
base-image-l374uiucui PyTorch cuda11.3-cudnn8-devel-ubuntu20.04-py38-torch1.11.0
base-image-l2t43iu6uk PyTorch cuda11.8-cudnn8-devel-ubuntu20.04-py38-torch2.0.0
base-image-0gxqmciyth TensorFlow cuda11.2-cudnn8-devel-ubuntu18.04-py38-tf2.5.0
base-image-uxeklgirir TensorFlow cuda11.2-cudnn8-devel-ubuntu20.04-py38-tf2.9.0
base-image-4bpg0tt88l TensorFlow cuda11.4-py38-tf1.15.5
base-image-mbr2n4urrc Miniconda cuda11.6-cudnn8-devel-ubuntu20.04-py38
base-image-qkkhitpik5 Miniconda cuda10.2-cudnn7-devel-ubuntu18.04-py38
base-image-h041hn36yt Miniconda cuda11.1-cudnn8-devel-ubuntu18.04-py38
base-image-7bn8iqhkb5 Miniconda cudagl11.3-cudnn8-devel-ubuntu20.04-py38
base-image-k0vep6kyq8 Miniconda cuda9.0-cudnn7-devel-ubuntu16.04-py36
base-image-l2843iu23k TensorRT cuda11.8-cudnn8-devel-ubuntu20.04-py38-trt8.5.1
base-image-l2t43iu6uk TensorRT cuda11.8-cudnn8-devel-ubuntu20.04-py38-torch2.0.0
  1. CUDA版本的值
CUDA版本 cuda_v字段传参值(整型) 说明
11.1 111 主机上GPU驱动支持的最高CUDA版本>=11.1的主机可调度
11.3 113 主机上GPU驱动支持的最高CUDA版本>=11.3的主机可调度
11.7 117 主机上GPU驱动支持的最高CUDA版本>=11.7的主机可调度
11.8 118 主机上GPU驱动支持的最高CUDA版本>=11.8的主机可调度
12.0 120 主机上GPU驱动支持的最高CUDA版本>=12.0的主机可调度
12.1 121 主机上GPU驱动支持的最高CUDA版本>=12.1的主机可调度
12.2 122 主机上GPU驱动支持的最高CUDA版本>=12.2的主机可调度

说明:如果您的框架使用的CUDA版本=11.5,上述可选值中没有,那么选择兼容您所需CUDA版本中的最低可选版本,也就是11.8。因为高版本驱动可以兼容低版本CUDA,所以可以正常使用,但是如果选择的版本过高将导致可调度的机器范围缩小,影响可用卡的数量。