为什么我只从最后一个网址获得结果? 我的想法是获得两个网址的结果列表。
另外,随着csv中的打印,我每次都得到一个空行。如何删除此行?
import csv
import requests
from lxml import html
import urllib
TV_category = ["_108-tot-127-cm-43-tot-50-,98952,501090","_128-tot-150-cm-51-tot-59-,98952,501091"]
url_pattern = 'http://www.mediamarkt.be/mcs/productlist/{}.html?langId=-17'
for item in TV_category:
url = url_pattern.format(item)
page = requests.get(url)
tree = html.fromstring(page.content)
outfile = open("./tv_test1.csv", "wb")
writer = csv.writer(outfile)
rows = tree.xpath('//*[@id="category"]/ul[2]/li')
for row in rows:
price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())')
product_ref = row.xpath('normalize-space(div/div/h2/a/text())')
writer.writerow([product_ref,price])
答案 0 :(得分:0)
正如我在问题的评论中所解释的那样,你需要将第二个 for 循环放在第一个(最后)。否则,只会将最后的行结果保存/写入CSV格式文件。
您不需要在每个循环中打开文件( with 语句将自动关闭它)。同样重要的是要强调,如果你打开一个带有写标志的文件,它将覆盖,如果它在一个循环内,它将在每次打开时覆盖。
我按照以下方式重构您的代码:
import csv
import requests
from lxml import html
import urllib
TV_category = ["_108-tot-127-cm-43-tot-50-,98952,501090","_128-tot-150-cm-51-tot-59-,98952,501091"]
url_pattern = 'http://www.mediamarkt.be/mcs/productlist/{}.html?langId=-17'
with open("./tv_test1.csv", "wb") as outfile:
writer = csv.writer(outfile)
for item in TV_category:
url = url_pattern.format(item)
page = requests.get(url)
tree = html.fromstring(page.content)
rows = tree.xpath('//*[@id="category"]/ul[2]/li')
for row in rows:
price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())')
product_ref = row.xpath('normalize-space(div/div/h2/a/text())')
writer.writerow([product_ref,price])