搜索一组负面的单词

时间:2014-01-18 18:38:08

标签: python parsing python-2.7 html-parsing beautifulsoup

这是我今天早上提出的一个问题的后续问题 - Using Beatiful Soup to get data from non-class section

我正在尝试获取一组信息并根据一组关键字进行搜索。如果在信息集中找到关键字,我想保释。但是,即使关键字在信息集中,我的代码也找不到关键字

negative_keywords = ['basement', 'unfinished', 'hardwood']  #defined at beginning of script


bodyContents = soup.find(attrs={'id' : 'postingbody'})
for validate in negative_keywords:
    if (string.find(str(bodyContents.string).lower(),validate) != -1):
        keyword_found = TRUE
        continue

以下是示例数据

<section id="postingbody">


        3BR/2BA newly renovated ranch

<p>
    <b></b>
</p>
<hr></hr>
<h2>

    Some address

</h2>
<h2>

    $950.00 / Month

</h2>
<h3 style="color:maroon;">

    - Description:

</h3>
<blockquote>

    3BR/2BA newly renovated ranch. Near all that Towne…

</blockquote>
<h3 style="color:maroon;">

    - Details:

</h3>
<ul>
    <li></li>
    <li></li>
    <li>
        <b></b>

         No

    </li>
    <li></li>
    <li></li>
    <li></li>
    <li></li>

1 个答案:

答案 0 :(得分:1)

我就是这样做的

import BeautifulSoup

negative_keywords = ['basement', 'unfinished', 'hardwood']

html = '''
<section id="postingbody">
    Looking for a corporate rental, this beautiful decorated 5 BR,
    4.5 BA two story house is in a desirable location, 7 minutes off
    I 85. Beautiful solid cherry cabinets in kitchen and laundry room.
    All stainless steel appliances. Hardwood floors in kitchen and foyer,
    Ceramic tile floors in all bathrooms, laundry room, dining room and sunroom.

    <br>
    </br>

</section>
'''

soup = BeautifulSoup.BeautifulSoup(html)
bodyContents = soup.find(attrs={'id' : 'postingbody'})

if any([k in bodyContents.getText().lower() for k in negative_keywords]):
    print "keyword was found"