Question

我正在尝试从任何随机网站中删除某些单词，但是当我尝试打印结果时，以下程序未显示错误，也未显示任何输出。

我已经检查了两次代码，甚至合并了一个if语句来查看程序是否有任何单词。


    import requests
    import operator
    from bs4 import BeautifulSoup


    def word_count(url):

        wordlist = []

        source_code = requests.get(url)

        source = BeautifulSoup(source_code.text, features="html.parser")

        for post_text in source.findAll('a', {'class':'txt'}):
            word_string=post_text.string

            if word_string is not None:
                word = word_string.lower().split()

                for each_word in word:
                    print(each_word)
                    wordlist.append(each_word)

                else:
                    print("None")

    word_count('https://mumbai.craigslist.org/')

我希望“ class = txt”下的所有单词都显示在输出中。

Answer 1

OP ：我希望所有类文字都显示在输出中

罪魁祸首：

for post_text in source.findAll('a', {'class':'txt'}):

原因：

anchor标记没有类txt，但其中的span标记却有。

因此：

import requests
from bs4 import BeautifulSoup

def word_count(url):
    source_code = requests.get(url)
    source=BeautifulSoup(source_code.text, features="html.parser")

    for post_text in source.findAll('a'):
        s_text = post_text.find('span', class_ = "txt")
        if s_text is not None:
            print(s_text.text)

word_count('https://mumbai.craigslist.org/')

输出：

community
activities
artists
childcare
classes
events
general
groups
local news
lost+found
missed connections
musicians
pets
.
.
.

Answer 2

您定位的元素错误。

如果您使用

print(source)

一切正常，但是当您尝试使用findAll定位元素时，由于出现了空列表数组，因此定位错误。

如果您替换

for post_text in source.findAll('a', {'class':'txt'}):

使用

for post_text in source.find_all('a'):

一切似乎都很好

Answer 3

我去过https://mumbai.craigslist.org/，发现没有<a class="txt">，只有<span class="txt">，所以我想您可以尝试以下方法：

def word_count(url):
    wordlist = []
    source_code = requests.get(url)
    source=BeautifulSoup(source_code.text, features="html.parser")
    for post_text in source.findAll('span', {'class':'txt'}):
        word_string=post_text.text
        if word_string is not None:
            word = word_string.lower().split ()
            for each_word in word:
                print(each_word)
                wordlist.append(each_word)
         else:
            print("None")

它将正确输出：

community
activities
artists
childcare
classes
events
general
...

希望对您有所帮助，如果还有其他问题，请发表评论。：）

Python：无法使用beautifulsoup获得任何输出

3 个答案: