我一直遇到干净的csv输出问题。
以下是该计划:
import csv
import requests
from lxml import html
page = requests.get('http://www.mediamarkt.be/mcs/productlist/_108-tot-127-cm-43-tot-50-,98952,501090.html?langId=-17')
tree = html.fromstring(page.content)
outfile = open("./tv_test1.csv", "wb")
writer = csv.writer(outfile)
rows = tree.xpath('//*[@id="category"]/ul[2]/li')
writer.writerow(["Product Name", "Price"])
for row in rows:
price = row.xpath('div/aside[2]/div[1]/div[1]/div/text()')
product_ref = row.xpath('div/div/h2/a/text()')
writer.writerow([product_ref,price])
outfile.close()
当前输出:
['\r\n\t\t\t\t\tTV SAMSUNG UE48JU6640UXXN 48" LCD FULL LED Smart Ultra HD Curved\r\n\t\t\t\t'],"['999,-']"
必需的输出:
TV SAMSUNG UE48JU6640UXXN 48" LCD FULL LED Smart Ultra HD Curve,999,-
答案 0 :(得分:0)
找到它:
import csv
import requests
from lxml import html
page =
requests.get('http://www.mediamarkt.be/mcs/productlist/_108-tot-127-cm-43-tot-50-,98952,501090.html?langId=-17')
tree = html.fromstring(page.content)
outfile = open("./tv_test1.csv", "wb") writer = csv.writer(outfile)
rows = tree.xpath('//*[@id="category"]/ul[2]/li')
writer.writerow(["Product Name", "Price"])
for row in rows:
price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())')
product_ref = row.xpath('normalize-space(div/div/h2/a/text())')
writer.writerow([product_ref,price])
outfile.close()
答案 1 :(得分:0)
您可以在将数据写入csv文件之前删除\n
,\r
和\t
:
import csv
import requests
from lxml import html
page = requests.get('http://www.mediamarkt.be/mcs/productlist/_108-tot-127-cm-43-tot-50-,98952,501090.html?langId=-17')
tree = html.fromstring(page.content)
outfile = open("./tv_test1.csv", "wb")
writer = csv.writer(outfile)
rows = tree.xpath('//*[@id="category"]/ul[2]/li')
writer.writerow(["Product Name", "Price"])
for row in rows:
price = row.xpath('div/aside[2]/div[1]/div[1]/div/text()')
for i in range(len(price)):
price[i]= price[i].replace("\n","")
price[i]= price[i].replace("\t","")
price[i]= price[i].replace("\r","")
product_ref = row.xpath('div/div/h2/a/text()')
for i in range(len(product_ref)):
product_ref[i]= product_ref[i].replace("\n","")
product_ref[i]= product_ref[i].replace("\t","")
product_ref[i]= product_ref[i].replace("\r","")
if len(product_ref) and len(price):
writer.writerow([product_ref,price])
outfile.close()
你将拥有:
请注意,在将price
和product_ref
存储到文件中之前,我还检查了 <link rel="stylesheet" href="~/path/yourcssfile.css" />
和respond_to do |format|
format.ini do
response.headers['Content-Disposition'] = "attachment; filename=somefile.ini"
render ini: SomeClass.make_ini(data)
end
end
的长度。