Question

我正在尝试隔离一些丹麦的选举数据，并且希望将名称隔离在我的输出中，所以我不会得到像这样的输出：

"div class="table-like-cell col-xs-7 col-sm-6 col-md-6 col-lg-8">Jeppe Kofod</div>

我尝试在末尾的“ navn”后面使用get_text，并选择而不是findAll

from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen as uReq  # Web client
from urllib.request import Request

# URl to web scrap from.
# in this example we web scrap graphics cards from Newegg.com
page_url =Request("https://www.kmdvalg.dk/ev/2019/e1003A.htm",headers={'User-Agent': 'Mozilla/5.0'})

# opens the connection and downloads html page from url
uClient = uReq(page_url)

# parses html into a soup data structure to traverse html
# as if it were a json data type.
page_soup = soup(uClient.read(), "html.parser")
uClient.close()

# finds each product from the store page
containers = page_soup.findAll("div",{"class": "kmd-personal-votes-list"})

# name the output file to write to local disk
out_filename = "kmd_valg.csv"
# header of csv file to be written
headers = "navn,personlige_stemmer,parti\n"

# opens file, and writes headers
f = open(out_filename, "w")
f.write(headers)
# loops over each product and grabs attributes about
# each product


navn = page_soup.findAll("div", class_="table-like-cell col-xs-7 col-sm-6 col-md-6 col-lg-8")

 # prints the dataset to console
print(navn)

我希望名称显示在列表中，例如：

Jeppe Kofod
Christel Schaldemose
Niels Fuglsang 
...

Answer 1

您可以将css选择器与bs4一起使用，如下所示

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.kmdvalg.dk/ev/2019/e1003A.htm')
soup = bs(r.content,'lxml')
names = [item.text for item in soup.select('.table-like-cell.col-xs-7')][1:]
print(names)

使用BeautifulSoup刮取精确值时遇到麻烦

1 个答案: