我正在用python制作网页抓取工具。
我想从生成的csv中删除空白行,并想添加标题为“ Car make”,“ Car Model”,“ Price”的标头。并且还想从生成的csv中的所有名称中删除[]。
imports go here...
source = requests.get(' website link goes here...').text
soup = bs(source, 'html.parser')
csv_file = open('pyScraper_1.3_Export', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['brand_Names', 'Prices'])
csv_file.close()
#gives us the make and model of all cars
Names = []
Prices_Cars = []
for var1 in soup.find_all('h3', class_ = 'brandModelTitle'):
car_Names = var1.text # var1.span.text
test_Split = car_Names.split("\n")
full_Names = test_Split[1:3]
#make = test_Split[1:2]
#model = test_Split[2:3]
Names.append(full_Names)
#prices
for Prices in soup.find_all('span', class_ = 'f20 bold fieldPrice'):
Prices = Prices.span.text
Prices = re.sub("^\s+|\s+$", "",Prices, flags=re.UNICODE) # removing whitespace before the prices
Prices_Cars.append(Prices)
csv_file = open('pyScraper_1.3_Export.csv', 'a')
csv_writer = csv.writer(csv_file)
i = 0
while i < len(Prices_Cars):
csv_writer.writerow([Names[i], Prices_Cars[i]])
i = i + 1
csv_file.close()
here is the screenshot of the generated csv
![][1]
[1]: https://i.stack.imgur.com/m7Xw1.jpg
答案 0 :(得分:0)
要删除其他换行符:
csv_file = open('pyScraper_1.3_Export.csv', 'a', newline='')
(“如果csvfile是文件对象,则应使用newline =来打开它。”,https://docs.python.org/3/library/csv.html#csv.writer)
要添加标题:
您实际上是在添加标头,但是对于名为pyScraper_1.3_Export
的文件(注意扩展名为.csv
),这可能是错误的类型。只需将第6行的代码更改为
csv_file = open('pyScraper_1.3_Export.csv', 'w', newline='')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(["Car make", "Car Model", "Price"])
csv_file.close()
要删除嵌套列表,请使用Names[i]
运算符解压缩*
:
csv_writer.writerow([*Names[i], Prices_Cars[i]])