我应该如何将数据从公共Google存储桶下载到python中?

时间:2019-01-30 18:40:58

标签: python web-scraping google-cloud-storage

我想访问公共Google存储桶中的xml文件。我尝试执行以下操作:

import requests

url = 'https://storage.cloud.google.com/gcp-public-data-sentinel-2/tiles/04/Q/FJ/S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE/MTD_MSIL1C.xml'

response = requests.get(url)

发生的事情是响应是Google登录页面的代码,而不是我想要访问的xml文件。我应该如何在python中访问此数据?

1 个答案:

答案 0 :(得分:0)

要直接下载文件(而无需登录google ),则需要更改url,即:

发件人

https://storage.cloud.google.com/gcp-public-data-sentinel-2/tiles/04/Q/FJ/S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE/MTD_MSIL1C.xml

收件人

https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/04/Q/FJ/S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE/MTD_MSIL1C.xml

Python示例:

import requests
u = "https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/04/Q/FJ/S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE/MTD_MSIL1C.xml"
r = requests.get(u)
open('MTD_MSIL1C.xml', 'wb').write(r.content)

MTD_MSIL1C.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<n1:Level-1C_User_Product xmlns:n1="https://psd-14.sentinel2.eo.esa.int/PSD/User_Product_Level-1C.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://psd-14.sentinel2.eo.esa.int/PSD/User_Product_Level-1C.xsd">
    <n1:General_Info>
        <Product_Info>
            <PRODUCT_START_TIME>2019-01-26T21:09:21.024Z</PRODUCT_START_TIME>
            <PRODUCT_STOP_TIME>2019-01-26T21:09:21.024Z</PRODUCT_STOP_TIME>
            <PRODUCT_URI>S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE</PRODUCT_URI>
            <PROCESSING_LEVEL>Level-1C</PROCESSING_LEVEL>
            <PRODUCT_TYPE>S2MSI1C</PRODUCT_TYPE>
...

注释:

  1. Accessing Public Data (API链接)
  2. 我不知道为什么仅通过更改url就能起作用,但是确实可以。