我正在尝试隔离一些丹麦的选举数据,并且希望将名称隔离在我的输出中,所以我不会得到像这样的输出:
"div class="table-like-cell col-xs-7 col-sm-6 col-md-6 col-lg-8">Jeppe Kofod</div>
我尝试在末尾的“ navn”后面使用get_text,并选择而不是findAll
from bs4 import BeautifulSoup as soup # HTML data structure
from urllib.request import urlopen as uReq # Web client
from urllib.request import Request
# URl to web scrap from.
# in this example we web scrap graphics cards from Newegg.com
page_url =Request("https://www.kmdvalg.dk/ev/2019/e1003A.htm",headers={'User-Agent': 'Mozilla/5.0'})
# opens the connection and downloads html page from url
uClient = uReq(page_url)
# parses html into a soup data structure to traverse html
# as if it were a json data type.
page_soup = soup(uClient.read(), "html.parser")
uClient.close()
# finds each product from the store page
containers = page_soup.findAll("div",{"class": "kmd-personal-votes-list"})
# name the output file to write to local disk
out_filename = "kmd_valg.csv"
# header of csv file to be written
headers = "navn,personlige_stemmer,parti\n"
# opens file, and writes headers
f = open(out_filename, "w")
f.write(headers)
# loops over each product and grabs attributes about
# each product
navn = page_soup.findAll("div", class_="table-like-cell col-xs-7 col-sm-6 col-md-6 col-lg-8")
# prints the dataset to console
print(navn)
我希望名称显示在列表中,例如:
Jeppe Kofod
Christel Schaldemose
Niels Fuglsang
...
答案 0 :(得分:0)
您可以将css选择器与bs4一起使用,如下所示
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.kmdvalg.dk/ev/2019/e1003A.htm')
soup = bs(r.content,'lxml')
names = [item.text for item in soup.select('.table-like-cell.col-xs-7')][1:]
print(names)