如何在Dict和List之间拆分XML API响应

时间:2019-04-19 22:53:05

标签: xml python-3.x web-scraping beautifulsoup

我有以下XML响应:

<?xml version="1.0" encoding="utf-8"?>
<export_response xmlns:xsd="" xmlns:xsi="" xmlns="">
  <success>true</success>
  <row_count>2</row_count>
  <offers>
    <offer>
      <offer_id>336</offer_id>
      <offer_name>Home Page Flagship Product</offer_name>
      <advertiser>
        <advertiser_id xmlns="API:id_name_store">61</advertiser_id>
        <advertiser_name xmlns="API:id_name_store">bradvertiser</advertiser_name>
      </advertiser>
      <vertical>
        <vertical_id xmlns="API:id_name_store">87</vertical_id>
        <vertical_name xmlns="API:id_name_store">DTC Ecom</vertical_name>
      </vertical>
      <offer_type>
        <offer_type_id xmlns="API:id_name_store">3</offer_type_id>
        <offer_type_name xmlns="API:id_name_store">3rd Party</offer_type_name>
      </offer_type>
      <offer_status>
        <offer_status_id xmlns="API:id_name_store">1</offer_status_id>
        <offer_status_name xmlns="API:id_name_store">Public</offer_status_name>
      </offer_status>
      <hidden>false</hidden>
    </offer>
    <offer>
      <offer_id>337</offer_id>
      <offer_name>Complimentary Product</offer_name>
      <advertiser>
        <advertiser_id xmlns="API:id_name_store">61</advertiser_id>
        <advertiser_name xmlns="API:id_name_store">bradvertiser</advertiser_name>
      </advertiser>
      <vertical>
        <vertical_id xmlns="API:id_name_store">87</vertical_id>
        <vertical_name xmlns="API:id_name_store">DTC Ecom</vertical_name>
      </vertical>
      <offer_type>
        <offer_type_id xmlns="API:id_name_store">3</offer_type_id>
        <offer_type_name xmlns="API:id_name_store">3rd Party</offer_type_name>
      </offer_type>
      <offer_status>
        <offer_status_id xmlns="API:id_name_store">1</offer_status_id>
        <offer_status_name xmlns="API:id_name_store">Public</offer_status_name>
      </offer_status>
      <hidden>false</hidden>
    </offer>
  </offers>
<export_response>

这是我的代码块:

import requests
import json
import csv
from bs4 import BeautifulSoup



addOfferValues = []
    for data in csv_reader:
        url = ""
        params = {"api_key":"",
                "offer_name":"",
                "offer_id":data['Offer ID'],
                "advertiser_id":data['Advertiser ID'],
                "vertical_id":data['Vertical ID'],
                "offer_type_id":"0",
                "media_type_id":"0",
                "tag_id":"0",
                "start_at_row":"0",
                "row_limit":"0",
                "sort_field":"offer_name",
                "sort_descending":"TRUE",
                "offer_status_id":"0"}

        req = requests.get(url, params=params)
        response = BeautifulSoup(req.text, 'lxml')

        hidden = response.find('hidden').string
        hidden = 'on' if hidden == 'true' else 'off'

        addOfferParams = {"api_key":"",
                    "offer_id":"0",
                    "advertiser_id":response.find('advertiser_id').string,
                    "vertical_id":response.find('vertical_id').string,
                    "offer_name":response.find('offer_name').string,
                    "third_party_name":"",
                    "hidden":hidden,
                    "offer_status_id":response.find('offer_status_id').string,
                    "offer_type_id":response.find('offer_type_id').string}
        addOfferValues.append(addOfferParams)
        addOfferReq = requests.get('https://cs1', params=addOfferParams)

我的目标是获取第一个报价:

<offer>
      <offer_id>336</offer_id>
      <offer_name>Home Page Flagship Product</offer_name>
      <advertiser>
        <advertiser_id xmlns="API:id_name_store">61</advertiser_id>
        <advertiser_name xmlns="API:id_name_store">bradvertiser</advertiser_name>
      </advertiser>
      <vertical>
        <vertical_id xmlns="API:id_name_store">87</vertical_id>
        <vertical_name xmlns="API:id_name_store">DTC Ecom</vertical_name>
      </vertical>
      <offer_type>
        <offer_type_id xmlns="API:id_name_store">3</offer_type_id>
        <offer_type_name xmlns="API:id_name_store">3rd Party</offer_type_name>
      </offer_type>
      <offer_status>
        <offer_status_id xmlns="API:id_name_store">1</offer_status_id>
        <offer_status_name xmlns="API:id_name_store">Public</offer_status_name>
      </offer_status>
      <hidden>false</hidden>
    </offer>

并将其成功存储在我的addOfferParams Dict中。如果row_count大于1,我希望第二个要约数据或任何其他要约存储在我的addOfferValues列表中。然后,我将使用这些数据点再次发出请求。如果遇到任何菜鸟障碍,我将不胜感激。预先谢谢你!

1 个答案:

答案 0 :(得分:1)

我正在从文件中读取内容,但它显示了遍历所有商品行并将其添加到最终列表中的原理。我使用soup来引用soup对象,而不是response,因为我发现它不那么令人困惑。

然后您可以循环提出最终请求的清单。

from bs4 import BeautifulSoup

addOfferValues = []

with open(r'C:\Users\User\Desktop\test.xml', encoding="utf8") as f:
    contents = f.read()
    soup = BeautifulSoup(contents, "lxml")

    hidden = soup.find('hidden').string
    hidden = 'on' if hidden == 'true' else 'off' # assuming this is correct and as wanted 

    for offer in soup.select('offer'):
        addOfferParams = {"api_key":"",
                        "offer_id": offer.select_one('offer_id').text, # I had added. Did you wish to exclude?
                        "advertiser_id": offer.select_one('advertiser_id').text,
                        "vertical_id": offer.select_one('vertical_id').text,
                        "offer_name": offer.select_one('offer_name').text,
                        "third_party_name":"",
                        "hidden":hidden,
                        "offer_status_id": offer.select_one('offer_status_id').text,   # I had added. Did you wish to exclude?
                        "offer_type_id":  offer.select_one('offer_type_id').text} 
        addOfferValues.append(addOfferParams)

print(addOfferValues)

#loop final list making requests with params

输出:

enter image description here