用xpath循环python脚本。为什么我只从最后的网址获得结果?

时间:2016-04-24 20:45:47

标签: python xpath

为什么我只从最后一个网址获得结果? 我的想法是获得两个网址的结果列表。

另外,随着csv中的打印,我每次都得到一个空行。如何删除此行?

import csv
import requests
from lxml import html
import urllib

TV_category = ["_108-tot-127-cm-43-tot-50-,98952,501090","_128-tot-150-cm-51-tot-59-,98952,501091"]
url_pattern = 'http://www.mediamarkt.be/mcs/productlist/{}.html?langId=-17'

for item in TV_category:
    url = url_pattern.format(item)
    page = requests.get(url)
    tree = html.fromstring(page.content)

    outfile = open("./tv_test1.csv", "wb")
    writer = csv.writer(outfile)    

    rows = tree.xpath('//*[@id="category"]/ul[2]/li')


for row in rows:
    price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())')
    product_ref = row.xpath('normalize-space(div/div/h2/a/text())')
    writer.writerow([product_ref,price])

1 个答案:

答案 0 :(得分:0)

正如我在问题的评论中所解释的那样,你需要将第二个 for 循环放在第一个(最后)。否则,只会将最后的结果保存/写入CSV格式文件。

您不需要在每个循环中打开文件( with 语句将自动关闭它)。同样重要的是要强调,如果你打开一个带有写标志的文件,它将覆盖,如果它在一个循环内,它将在每次打开时覆盖。

我按照以下方式重构您的代码:

import csv
import requests
from lxml import html
import urllib

TV_category = ["_108-tot-127-cm-43-tot-50-,98952,501090","_128-tot-150-cm-51-tot-59-,98952,501091"]
url_pattern = 'http://www.mediamarkt.be/mcs/productlist/{}.html?langId=-17'

with open("./tv_test1.csv", "wb") as outfile:
    writer = csv.writer(outfile)

    for item in TV_category:
        url = url_pattern.format(item)
        page = requests.get(url)
        tree = html.fromstring(page.content)  
        rows = tree.xpath('//*[@id="category"]/ul[2]/li')

        for row in rows:
            price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())')
            product_ref = row.xpath('normalize-space(div/div/h2/a/text())')
            writer.writerow([product_ref,price])