我如何使用python从HTML代码中提取以“ icon”开头的单词

时间:2019-05-08 18:58:22

标签: python html nltk

我需要一个python代码才能使用python提取所选单词。

<a class="tel ttel">
<span class="mobilesv icon-hg"></span>
<span class="mobilesv icon-rq"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-rq"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-ikj"></span>
<span class="mobilesv icon-dc"></span>
<span class="mobilesv icon-acb"></span>
<span class="mobilesv icon-lk"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-nm"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-yz"></span>
</a>

我需要提取以“ icon”开头的单词

我需要的输出是

icon-hg,icon-rq,icon-ba,icon-rq,icon-ba,icon-ikj,icon-dc,icon-acb,icon-lk,icon-ba,icon-nm,icon-ba ,图标-yz

2 个答案:

答案 0 :(得分:0)

对于您的特定情况,您可以按以下方式获得它,但是我建议您使用漂亮的汤来解决各种问题,请记住,特殊情况还不够特殊,无法违反规则。

<div class="select">
      <select class="select-text">
        <option disabled selected>Select User</option>        
      </select>
    </div>

    <div class="user-photo">
      <img src="https://via.placeholder.com/200" alt="Placeholder" >
    </div>

    <div class="details mdc-elevation--z3">
      <p>
        <span class="prop" data-age>Age :</span>
        <span class="value" data-age-value>23 years</span>
      </p>
      <p>
        <span class="prop" data-height>Height :</span>
        <span class="value" data-height-value>169cm</span>
      </p>
      <p>
        <span class="prop" data-weight>Weight :</span>
        <span class="value" data-weight-value>68kg</span>
      </p>
      <p>
        <span class="prop" data-gender>Gender :</span>
        <span class="value" data-gender-value>Female</span>
      </p>
      <p>
        <span class="prop" data-country>Country :</span>
        <span class="value" data-country-value>Nigerian</span>
      </p>
    </div>

    <button id="oracle" class="mdc-button">Calculate BMI</button>
    <div id="outcome">
      <h5 class="mdc-typography--headline5" >
        BMI
      </h5>
      <p></p> //This is the P element      
    </div>

输出:

text = """
<a class="tel ttel">
<span class="mobilesv icon-hg"></span>
<span class="mobilesv icon-rq"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-rq"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-ikj"></span>
<span class="mobilesv icon-dc"></span>
<span class="mobilesv icon-acb"></span>
<span class="mobilesv icon-lk"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-nm"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-yz"></span>
</a>
"""

result = [word.split('"')[0] for word in text.split() if word.startswith('icon')]

print(result)

答案 1 :(得分:0)

如果您使用的是BeautifulSoup。 这将从图标到qoute(“)的字符串进行搜索。

from bs4 import BeautifulSoup
import re
s = """<a class="tel ttel">
<span class="mobilesv icon-hg"></span>
<span class="mobilesv icon-rq"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-rq"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-ikj"></span>
<span class="mobilesv icon-dc"></span>
<span class="mobilesv icon-acb"></span>
<span class="mobilesv icon-lk"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-nm"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-yz"></span>
</a>"""
soup = BeautifulSoup(s, "html.parser")
for s in soup.findAll("span"):
    s=str(s)
    print(re.search(r'(?=icon-)[^"]*',s).group())

结果:

icon-hg
icon-rq
icon-ba
icon-rq
icon-ba
icon-ikj
icon-dc
icon-acb
icon-lk
icon-ba
icon-nm
icon-ba
icon-yz