Question

我正在尝试做一些简单的单词/句子查找器。

试过这个：

import urllib
from urllib import request

url = "https://fotka.com/profil/k"
word = "Nie ma profilu"


def search_website(url, word):
page = urllib.request.urlopen(url)
phrase_present = False

for i in page:
    if bytes(word, encoding='utf8') in i:
        phrase_present = True
        print(i)

return phrase_present

finder = search_website(url, word)
print(finder)

看起来它工作正常但是关于url的解释。如果您在浏览器中打开：

url = "https://fotka.com/profil/k"

确实已搜索word，因此返回True，但如果您打开：

url = "https://fotka.com/profil/kkkk"

页面上没有此类word，但仍会返回True。

我检查了变量page的内容，在两种情况下它都是相同的，而url是不同的......

任何人都知道为什么有任何解决方法的想法？

Answer 1

您发布了一个非常广泛的演员，但我认为您正在寻找段落标签<p>之间的数据：

import re
import urllib
url = "some page"
word = "some word"

page_data = str(urllib.urlopen(url).read())
paragraph_data = re.findall("<p>(.*?)</p>", page_data)
final_paragraph_data = [i for i in paragraph_data if word in i]

final_paragraph_data现在存储包含word内容的所有句子群集的列表。

Answer 2

如果您的问题是“如何检查页面上是否有可见的测试？”那么，这可能是你的解决方案

[{"Values":123,"_row":"Type1"}, 
 {"Values": 4565,"_row":"Type2"}, 
 {"Values": 7812,"_row":"Type3"}]

如何在网站上找到句子？

2 个答案: