Question

我正试图在https://www.formularylookup.com/上为某些公司刮擦有关药品和市场资产的页面

下面的代码为我提供了所需的数据，如计划数，药房所覆盖的药物以及状态的百分比。这是我的输出示例，所需的输出只是“ 1330计划”：

计划数：

<td class="plan-count" role="gridcell">1330 plans</td>

我尝试在每个tag.find之后使用.text，但是它不起作用。

这是我关于此特定部分的代码。上面还有很多事情要做，但是其中包括我无法分享的登录信息。

total = []

soup = BeautifulSoup(html, "lxml")

for tag in soup.find_all("tbody", {"role":"rowgroup"}):
    #name = tag.find("td", {"class":"payer-name"}) #gives me whole tag
    name = tag.find("tr", {"role":"row"}).find("td").get("payer-name") #gives me None output
    plan = tag.find("td", {"class":"plan-count"})  #gives me whole tag
    stat = tag.find("td", {"class":"icon-status"}) #gives me whole tag

    data = {"Payer": name, "Number of plans": plan, "Status": stat}

    total.append(data)

df = pd.DataFrame(total)
print(df)

这里是使用检查功能的代码段。

<tbody role="rowgroup">
    <tr data-uid="a5795205-1518-4a74-b039-abcd1b35b409" role="row">
        <td class="payer-name" role="gridcell">CVS Caremark RX</td>
        <td class="plan-count" role="gridcell">1330 plans</td>
        <td role="gridcell" class="icon-status icon-status-not-covered">98% Not Covered</td>
     </tr>

编辑：深入研究SO之后，我看到solution可能正在使用BS4的Contents功能。如果有效，将向您报告。 -这不起作用： “ AttributeError：'NoneType'对象没有属性'contents'”

Answer 1

我知道了。显然还有其他标签以tbody rowgroup开头，它们被分类为None，因此在我的代码到达我想要的部分之前，无法获得这些标签的.text。

我只需要更改此行：

for tag in soup.find_all("tbody", {"role":"rowgroup"}):

使用BS4-如何只获取文本，而不获取标签？

1 个答案: