Question

为什么我只从最后一个网址获得结果？我的想法是获得两个网址的结果列表。

另外，随着csv中的打印，我每次都得到一个空行。如何删除此行？

import csv
import requests
from lxml import html
import urllib

TV_category = ["_108-tot-127-cm-43-tot-50-,98952,501090","_128-tot-150-cm-51-tot-59-,98952,501091"]
url_pattern = 'http://www.mediamarkt.be/mcs/productlist/{}.html?langId=-17'

for item in TV_category:
    url = url_pattern.format(item)
    page = requests.get(url)
    tree = html.fromstring(page.content)

    outfile = open("./tv_test1.csv", "wb")
    writer = csv.writer(outfile)    

    rows = tree.xpath('//*[@id="category"]/ul[2]/li')


for row in rows:
    price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())')
    product_ref = row.xpath('normalize-space(div/div/h2/a/text())')
    writer.writerow([product_ref,price])

Answer 1

正如我在问题的评论中所解释的那样，你需要将第二个 for 循环放在第一个（最后）。否则，只会将最后的行结果保存/写入CSV格式文件。

您不需要在每个循环中打开文件（ with 语句将自动关闭它）。同样重要的是要强调，如果你打开一个带有写标志的文件，它将覆盖，如果它在一个循环内，它将在每次打开时覆盖。

我按照以下方式重构您的代码：

import csv
import requests
from lxml import html
import urllib

TV_category = ["_108-tot-127-cm-43-tot-50-,98952,501090","_128-tot-150-cm-51-tot-59-,98952,501091"]
url_pattern = 'http://www.mediamarkt.be/mcs/productlist/{}.html?langId=-17'

with open("./tv_test1.csv", "wb") as outfile:
    writer = csv.writer(outfile)

    for item in TV_category:
        url = url_pattern.format(item)
        page = requests.get(url)
        tree = html.fromstring(page.content)  
        rows = tree.xpath('//*[@id="category"]/ul[2]/li')

        for row in rows:
            price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())')
            product_ref = row.xpath('normalize-space(div/div/h2/a/text())')
            writer.writerow([product_ref,price])

用xpath循环python脚本。为什么我只从最后的网址获得结果？

1 个答案: