如何将字符串列表输出到具有多列的.csv文件

时间:2018-11-01 17:47:43

标签: python python-3.x beautifulsoup

我正在尝试构建一个刮板程序,将所有瑞典国会议员放入具有多列的.csv文件中。

我设法获得了姓名列表,如下所示。我在将字符串分成姓氏,名字和聚会的问题,然后用这三列写入.csv文件时遇到问题,我该怎么办?

代码:

source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter- 
partier/").read()
soup = bs.BeautifulSoup(source, "lxml")

names = soup.find_all("span", {"class": "fellow-name"})

for span in soup.find_all("span", {"class": "fellow-name"}):
    cleanednames = span.text.strip()
    print(cleanednames)

输出:

Acketoft, Tina (L)
Adaktusson, Lars (KD)
Ahlberg, Ann-Christin (S)
Akhondi, Alireza (C)
Ali-Elmi, Leila (MP)
Alm Ericson, Janine (MP)
...

3 个答案:

答案 0 :(得分:0)

这是一个使用pandas库编写csv的代码段。从每个同伴姓名范围中,我们提取姓氏,名字和聚会,并将这三个字符串作为列表追加到列表中。然后,我们将该列表列表转换为pandas数据框,并将其写入csv。

import urllib
import bs4 as bs 
import pandas as pd
source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter-partier/").read()
soup = bs.BeautifulSoup(source, "lxml")

names = soup.find_all("span", {"class": "fellow-name"})

list_of_mps = []

for span in soup.find_all("span", {"class": "fellow-name"}):
    cleanednames = span.text.strip()
    split_name = cleanednames.split(',')
    last_name = split_name[0]
    first_name_and_party=split_name[1].strip()
    first_name=' '.join(first_name_and_party.split(' ')[:-1])
    party=first_name_and_party.split(' ')[-1]
    list_of_mps.append([last_name,first_name,party])
pd.DataFrame(list_of_mps,columns = ['last_name','first_name','party']).to_csv('names_parties')

答案 1 :(得分:0)

使用显示的输出,可以将其循环添加到csv文件中。

选择一个空列表,并将字段附加到列表中,而不是打印。参见下面的示例。

data = []

for span in soup.find_all("span", {"class": "fellow-name"}):
    cleanednames = span.text.strip()
    data.append(cleanednames)  #fields are appended to list rather printing

现在有了列表,您可以提取last_namefirst_nameparty并将其写入csv文件。参见下面的示例以写入csv。

with open("result.csv", "w") as stream:
    feildnames = ["Last_Name","First_Name","Party"]
    var = csv.DictWriter(stream, fieldnames=feildnames)
    var.writeheader()
    for item in data:
        last_name, First_name, party = item.split()  #splitting data in 3 fields
        last_name = last_name.replace(",","")  #removing ',' from last name
        party = party.replace("(","").replace(")","")  #removing "()" from party
        var.writerow({"Last_Name": last_name,"First_Name": First_name, "Party": party})  #writing to csv row

答案 2 :(得分:0)

正如前面的评论中提到的那样,熊猫是过大的杀伤力。改为使用csv,我们有:

import urllib.request
import bs4 as bs
import csv

source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter-partier/").read()
soup = bs.BeautifulSoup(source, "lxml")

names = soup.find_all("span", {"class": "fellow-name"})
with open("csv-name.csv", 'w') as csv_file:
    writer = csv.writer(csv_file)
    for span in soup.find_all("span", {"class": "fellow-name"}):
        cleanednames = span.text.strip()
        lname, rest = cleanednames.split(", ")
        rest = rest.split(" ")
        party = rest[-1]
        fname = " ".join(rest[:-1])
        writer.writerow([lname, fname, party])

代码中发生了什么:我们首先用逗号分开;逗号前的所有内容均为姓氏。然后我们按照空间划分,我们知道最后的事情将是聚会。最后,剩下的就是名字。