Question

我能够使用BeautifulSoup编写并执行程序。我的概念是通过csv文件解析多个URL来从html源捕获详细信息，并将输出另存为csv。

编程执行得很好，但是csv覆盖了第一行本身的值。

输入文件具有三个要解析的网址

我希望输出存储在3个不同的行中。

下面是我的代码

import csv
import requests
import pandas
from bs4 import BeautifulSoup

with open("input.csv", "r") as f:
    reader = csv.reader(f)

    for row in reader:
        url = row[0]

        print (url)
        r=requests.get(url)

        c=r.content

        soup=BeautifulSoup(c, "html.parser")

        all=soup.find_all("div", {"class":"biz-country-us"})

        for br in soup.find_all("br"):
            br.replace_with("\n")

l=[]
for item in all:
    d={}

    name=item.find("h1",{"class":"biz-page-title embossed-text-white shortenough"})
    d["name"]=name.text.replace("  ","").replace("\n","")

    claim=item.find("div", {"class":"u-nowrap claim-status_teaser js-claim-status-hover"})
    d["claim"]=claim.text.replace("  ","").replace("\n","")

    reviews=item.find("span", {"class":"review-count rating-qualifier"})
    d["reviews"]=reviews.text.replace("  ","").replace("\n","")



    l.append(d)

df=pandas.DataFrame(l)
df.to_csv("output.csv")

如果我不清楚要解释什么，请告诉我。

Answer 1

按照建议的in this post，在附加模式下打开输出文件，并进行以下修改：首次添加标头：

from os.path import isfile

if not isfile("output.csv", "w"):
    df.to_csv("output.csv", header=True)
else:
    with open("output.csv", "a") as f:
        df.to_csv(f, header=False)

CSV结果覆盖值

1 个答案: