如何将数据导入数据集并在Google Cloud AutoML中重新训练自定义模型

时间:2018-10-04 03:10:26

标签: c# rest machine-learning google-cloud-platform google-cloud-vision

我是GCP的新开发人员,并且了解Google Cloud AutoML自定义模型。但是我在使用AutoML Vision时遇到两个问题。

1。我无法将数据从云存储中的csv文件导入数据集。 我正在使用C#调用RestAPI,但其错误404。以下是我的代码。

var uri = "https://automl.googleapis.com/v1beta1/projects/{project-id}/locations/us-central1/datasets/{dataset-id}:import";

        var request = (HttpWebRequest)WebRequest.Create(uri);
        request.Method = "POST";
        request.ContentType = "application/json";
        request.Headers.Add("Authorization", "Bearer " + _token);

        using (var streamWriter = new StreamWriter(request.GetRequestStream()))
        {
            string json = "{\"inputUris\":\"gs://{bucket-name}/Vehicles/csv/{csv-file-name}.csv\"}";
            Console.WriteLine(json);
            streamWriter.Write(json);
            streamWriter.Flush();
            streamWriter.Close();
        }

        try
        {
            var httpResponse = (HttpWebResponse)request.GetResponse();
            using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
            {
                var result = streamReader.ReadToEnd();
                Console.WriteLine(result);
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
        }

2。如何使用C#或RestAPI重新训练自定义模型

例如:用户上载带有该图像标签的新图像。然后创建一个csv文件并上传到云存储。因此,我想使用该csv文件导入数据集,然后仅对自定义模型重新训练csv文件中的图像(将新的图像训练添加到旧模型中)。

2 个答案:

答案 0 :(得分:1)

除了@Awais答案, 正确的调用api是

https://automl.googleapis.com/v1beta1/projects/{id-project}/locations/us-central1/datasets/{id-dataset}:importData

,此功能的正确json格式有效载荷为

{
    "inputConfig": {
        "gcsSource": {
            "inputUris": [
                "gs://my-bucket-vcm/uploads/app/csv/19_03_2019_18_16_35.csv"
            ]
        }
    }
}

Source

答案 1 :(得分:0)

问题1的答案: 建议您重新检查csv文件,选中此link 示例:

gs://my-project-lcm/training-data/file1.txt,Sports,Basketball
gs://my-project-lcm/training-data/ubuntu.zip,Computers,Software,Operating_Systems,Linux,Ubuntu
file://news/documents/file2.txt,Sports,Baseball
"Miles Davis was an American jazz trumpeter, bandleader, and composer.",Arts_Entertainment,Music,Jazz
TRAIN,gs://my-project-lcm/training-data/astros.txt,Sports,Baseball
VALIDATE,gs://my-project-lcm/training-data/mariners.txt,Sports,Baseball
TEST,gs://my-project-lcm/training-data/cubs.txt,Sports,Baseball

问题2的答案: 我认为当您重新训练数据集(带有新图像)时,它将使用所有数据集(带有新图像)创建一个新模型。 如果查看模型列表,将会看到有2个模型和一个数据集。

这是要导入数据集时使用的curl:

curl 
  -X POST 
  -H "Authorization: Bearer here-access-token" 
  -H "Content-Type: application/json" 
  https://automl.googleapis.com/v1beta1/projects/{id-project}/locations/us-central1/datasets/{id-dataset}:import \
  -d '{
    "inputUris": "gs://name-bucket-vcm/csv/file-csv.csv",
  }'

这是python代码:

import requests

url = "https://automl.googleapis.com/v1beta1/projects/{id-project}/locations/us-central1/datasets/{id-dataset}:import"

payload = "{"inputUris": "gs://bucket-vcm/csv/file-csv.csv"}"
headers = {
    'Content-Type': "application/json"
    }

response = requests.request("POST", url, data=payload, headers=headers)

print(response.text)