从美丽汤中获取价值

时间:2020-10-21 14:44:16

标签: python html beautifulsoup

我有以下代码为我提供了此输出(每个链接以“,”分隔)。

ids = soup.find_all("tr",{"class":"table-row"})
#ids = soup.findAll("span",{"class": "text-nowrap"})
ids

其中一个链接的示例如下

                            <tr _ngcontent-sc371="" class="table-row">
                                <td _ngcontent-sc371="" class="table-col">
                                    <a _ngcontent-sc371="" data-gtm="address" class="text-primary font-weight-bolder text-left" href="/salg/info/326/25402/A5694110-6CCD-431B-B86D-D9B87CD062F6">
                                         Svalevaanget 20,
                                        <br _ngcontent-sc371="" class="d-block d-sm-none">
                                         4400 Kalundborg 
                                    </a>
                                </td>
                                <td _ngcontent-sc371="" class="table-col text-right">
                                    <span _ngcontent-sc371="" class="mobile-heading"> Koobesom </span>
                                    <span _ngcontent-sc371="" class="text-nowrap">995.000</span>
                                </td>
                                <td _ngcontent-sc371="" class="table-col text-right">
                                    <span _ngcontent-sc371="" class="mobile-heading"> Salgsdato </span>
                                    <span _ngcontent-sc371="" class="text-nowrap">22-09-2020</span>
                                </td>
                                <td _ngcontent-sc371="" class="table-col text-right">
                                    <span _ngcontent-sc371="" class="mobile-heading"> Boligtype </span>
                                    <span _ngcontent-sc371="" class="d-flex justify-content-end">
                                        <app-property-label _ngcontent-sc371="" _nghost-sc241="">
                                            <label _ngcontent-sc241="" class="property-4 hide-text">
                                                <app-tooltip _ngcontent-sc241="" class="md-right flex-shrink-0" _nghost-sc191="">
                                                    <p _ngcontent-sc191="" class="app-tooltip">
                                                         Fritidshus <!---->
                                                        <!---->
                                                        <!---->
                                                    </p>
                                                    <!---->
                                                    <span _ngcontent-sc241="" class="icon">F</span>
                                                </app-tooltip>
                                                <span _ngcontent-sc241="" class="text">Fritidshus</span>
                                            </label>
                                        </app-property-label>
                                    </span>
                                </td>
                                <td _ngcontent-sc371="" class="table-col text-right">
                                    <span _ngcontent-sc371="" class="mobile-heading"> Kr. / m² </span>
                                    <span _ngcontent-sc371="" class="text-nowrap"> 12.134 </span>
                                </td>
                                <td _ngcontent-sc371="" class="table-col text-right">
                                    <span _ngcontent-sc371="" class="mobile-heading"> Værelser </span>
                                    <span _ngcontent-sc371=""> 3 </span>
                                </td>
                                <td _ngcontent-sc371="" class="table-col text-right">
                                    <span _ngcontent-sc371="" class="mobile-heading"> m² </span>
                                    <span _ngcontent-sc371="" class="text-nowrap"> 82 m² </span>
                                </td>
                                <td _ngcontent-sc371="" class="table-col text-right">
                                    <span _ngcontent-sc371="" class="mobile-heading"> Byggear </span>
                                    <span _ngcontent-sc371=""> 1971 </span>
                                </td>
                                <td _ngcontent-sc371="" class="table-col text-right">
                                    <span _ngcontent-sc371="" class="mobile-heading"> Prisjustering </span>
                                    <!---->
                                </td>

我不确定如何从这里继续进行操作,以使价值观得到体现。我的目标是将其存储在DataFrame中,例如,我拥有列名和相应的值(基于链接)Salgsdato(22-09-2020),地址(Svalevaanget 20),Koobesom(995.000),m² (82)。

有人提供一些有用的指导吗?

欣赏它。

1 个答案:

答案 0 :(得分:0)

要从文档创建数据框并将其保存为CSV,可以使用以下示例(txt是问题中的HTML代码段):

import pandas as pd
from bs4 import BeautifulSoup

soup = BeautifulSoup(txt, 'html.parser')

# remove mobile-headings
for mh in soup.select('.mobile-heading'):
    mh.extract()

all_values = []
for tr in soup.select('tr.table-row'):
    tds = [td.get_text(strip=True, separator=' ') for td in tr.select('td')]
    all_values.append(tds)

df = pd.DataFrame(all_values, columns=['Address', 'Koobesom', 'Salgsdato', 'Boligtype', 'Kr. / m²', 'Værelser', 'm²', 'Byggear', 'Prisjustering'])
df.to_csv('data.csv')
print(df)

打印:

                            Address Koobesom  ... Byggear Prisjustering
0  Svalevaanget 20, 4400 Kalundborg  995.000  ...    1971              

[1 rows x 9 columns]

并创建data.csv(来自LibreOffice的屏幕截图):

Multiple require & permit strong parameters rails 4