Question

我对编程很陌生，目前正在尝试学习python。我的目标是使用网页抓取，或者更具体地说，使用BeautifulSoup来获取在dictionary.com上的单词的音节，以用作更大代码的一部分。这是我到目前为止所做的：

def count_syllables(keyword):
    url = 'http://dictionary.com/browse/{}'.format(keyword)
    web_object = requests.get(url)
    text = web_object.text
    text = text.encode('utf-8')
    soup = BeautifulSoup(text, 'html.parser')
    div = [div for div in soup.find_all('div', {'class':"waypoint-wrapper header-row header-first-row"})]
    span = [div.find(name='span') for div in div]

    return span

#output: [<span class="me" data-syllable="syl·la·ble">syllable</span>]

这只返回html源中的span标记，但不返回音节本身。例如，我希望将“syl·la·ble”从输入“syllable”一词中删除到dictionary.com上的搜索栏中。但是，我的代码只返回整个span标记。当我通过观看YouTube视频尝试其他方法时，我会不断获得空列表。所以我的问题是：我如何才能抓住span标签中的syl·la·ble部分？

Answer 1

您可以扩展您的span标记并使用data-syllable属性：

>>> span[0]['data-syllable']
'syl·la·bus'

通过BeautifulSoup查找音节数量？

1 个答案: