我有以下代码为我提供了此输出(每个链接以“,”分隔)。
ids = soup.find_all("tr",{"class":"table-row"})
#ids = soup.findAll("span",{"class": "text-nowrap"})
ids
其中一个链接的示例如下
<tr _ngcontent-sc371="" class="table-row">
<td _ngcontent-sc371="" class="table-col">
<a _ngcontent-sc371="" data-gtm="address" class="text-primary font-weight-bolder text-left" href="/salg/info/326/25402/A5694110-6CCD-431B-B86D-D9B87CD062F6">
Svalevaanget 20,
<br _ngcontent-sc371="" class="d-block d-sm-none">
4400 Kalundborg
</a>
</td>
<td _ngcontent-sc371="" class="table-col text-right">
<span _ngcontent-sc371="" class="mobile-heading"> Koobesom </span>
<span _ngcontent-sc371="" class="text-nowrap">995.000</span>
</td>
<td _ngcontent-sc371="" class="table-col text-right">
<span _ngcontent-sc371="" class="mobile-heading"> Salgsdato </span>
<span _ngcontent-sc371="" class="text-nowrap">22-09-2020</span>
</td>
<td _ngcontent-sc371="" class="table-col text-right">
<span _ngcontent-sc371="" class="mobile-heading"> Boligtype </span>
<span _ngcontent-sc371="" class="d-flex justify-content-end">
<app-property-label _ngcontent-sc371="" _nghost-sc241="">
<label _ngcontent-sc241="" class="property-4 hide-text">
<app-tooltip _ngcontent-sc241="" class="md-right flex-shrink-0" _nghost-sc191="">
<p _ngcontent-sc191="" class="app-tooltip">
Fritidshus <!---->
<!---->
<!---->
</p>
<!---->
<span _ngcontent-sc241="" class="icon">F</span>
</app-tooltip>
<span _ngcontent-sc241="" class="text">Fritidshus</span>
</label>
</app-property-label>
</span>
</td>
<td _ngcontent-sc371="" class="table-col text-right">
<span _ngcontent-sc371="" class="mobile-heading"> Kr. / m² </span>
<span _ngcontent-sc371="" class="text-nowrap"> 12.134 </span>
</td>
<td _ngcontent-sc371="" class="table-col text-right">
<span _ngcontent-sc371="" class="mobile-heading"> Værelser </span>
<span _ngcontent-sc371=""> 3 </span>
</td>
<td _ngcontent-sc371="" class="table-col text-right">
<span _ngcontent-sc371="" class="mobile-heading"> m² </span>
<span _ngcontent-sc371="" class="text-nowrap"> 82 m² </span>
</td>
<td _ngcontent-sc371="" class="table-col text-right">
<span _ngcontent-sc371="" class="mobile-heading"> Byggear </span>
<span _ngcontent-sc371=""> 1971 </span>
</td>
<td _ngcontent-sc371="" class="table-col text-right">
<span _ngcontent-sc371="" class="mobile-heading"> Prisjustering </span>
<!---->
</td>
我不确定如何从这里继续进行操作,以使价值观得到体现。我的目标是将其存储在DataFrame中,例如,我拥有列名和相应的值(基于链接)Salgsdato(22-09-2020),地址(Svalevaanget 20),Koobesom(995.000),m² (82)。
有人提供一些有用的指导吗?
欣赏它。
答案 0 :(得分:0)
要从文档创建数据框并将其保存为CSV,可以使用以下示例(txt
是问题中的HTML代码段):
import pandas as pd
from bs4 import BeautifulSoup
soup = BeautifulSoup(txt, 'html.parser')
# remove mobile-headings
for mh in soup.select('.mobile-heading'):
mh.extract()
all_values = []
for tr in soup.select('tr.table-row'):
tds = [td.get_text(strip=True, separator=' ') for td in tr.select('td')]
all_values.append(tds)
df = pd.DataFrame(all_values, columns=['Address', 'Koobesom', 'Salgsdato', 'Boligtype', 'Kr. / m²', 'Værelser', 'm²', 'Byggear', 'Prisjustering'])
df.to_csv('data.csv')
print(df)
打印:
Address Koobesom ... Byggear Prisjustering
0 Svalevaanget 20, 4400 Kalundborg 995.000 ... 1971
[1 rows x 9 columns]
并创建data.csv
(来自LibreOffice的屏幕截图):