Question

我正试图从牛津词典中删除信息。问题是类和＃34; form-groups＆＃34;有相同的类名。

我只想刮掉班级＆＃34; form-groups＆＃34;在条目上方1.对于单词＆＃34; acclimatize＆＃34;，我的代码有效。

但对于“＃34;奇特＆＃34;”这个词，它取消了班级＆＃34; form-groups＆＃34;在条目2下，这不是我想要的。我只想把课程和＃34; form-groups＆＃34;在条目1之上。

基本上是这样的：

如果＆＃34; form-groups＆＃34;条目1上方不存在，打印（＆＃34;无＆＃34;）;但不要刮掉其他形式的团体＆＃34;在不同的条目。

这是我的代码：

from bs4 import BeautifulSoup
import urllib.request
import requests
import time


word = ["peculiar"]
source = "https://en.oxforddictionaries.com/definition/"
for word in word:
    try:
        with urllib.request.urlopen(source + word) as url:
            s = url.read()
        soup = BeautifulSoup(s, "lxml")
        try:
            form_groups = soup.find('span', {'class': 'form-groups'}).text
            y = form_groups
        except:
            y = "no form_groups"

        print(word + "#" + y)
        time.sleep(2)
    except:
        print("No result for " + word)
        time.sleep(2)

我希望自己清楚明白，因为我对所有术语都不是很了解。任何输入都非常感谢！非常感谢你！

Answer 1

答案已嵌入您的问题中。您正在扫描span类form-groups的整个页面，但您实际上对字典文章的层次结构感兴趣：当它们是直接的时，您只需要该类的span个section类gramb的孩子，而不是树下的。{/ p>

编辑：原始答案是从错误的IDLE会话中粘贴的

section_grambs = soup.find_all('section', {'class': 'gramb'})
for section_gramb in section_grambs:
    for child in (section_gramb.children):
        if child.name == "span" and "form-groups" in child.attrs["class"]:
            y = child.text
        else:
            y = "no form groups"

Python bs4：如果条目A具有“表单组”，则将其删除;对于A以外的参赛作品，不要刮掉“形式群组”

1 个答案: