如何从源代码中找到单词的变体(python 3)

时间:2017-11-06 23:32:49

标签: python python-3.6

我必须编写一个程序来获取网站和关键字的用户输入,然后读取该单词的网站源代码。我必须对它进行编码,以便检测到该词的许多变化(例如,hello vs. hello,vs. hello!),我不知道该怎么做。到目前为止我有这样编码来检测确切的输入,但我不知道如何获得多种变化。非常感谢任何帮助,谢谢!

def main():
    [n,l]=user()
    print("Okay", n, "from", l, ", let's get started.")

    webname=input("What is the name of the website you wish to browse? ")
    website=requests.get(input("Please enter the URL: "))
    txt = website.text

    list=txt.split(",")
    print(type(txt))
    print(type(list))
    print(list[0:10])

    while True:
        numkey=input("Would you like to enter a keyword? Please enter yes or no: ")

        if numkey=="yes":
            key=input("Please enter the keyword to find: ")

        else:
            newurl()
        break

        find(webname,txt,key)

def find(web,txt,key):
    findtext=txt
    list=findtext.split(sep=" ")

    count = 0
    for item in list:
        if item==key:
            count=count+1
    print("The word", key, "appears", count, "times on", web)

def newurl():
    while True:
        new=input("Would you like to browse another website? Please enter yes or no: ")

        if new=="yes":
            main()

        else:
            [w,r]=experience()
            return new
        break

def user():
    name=input("Hello, what is your name? ")
    loc=input("Where are you from? ")
    return [name,loc]

def experience():

    wordeval=input("Please enter 3 words to describe the experience, separated by spaces (ex. fun cool interesting): ") 
    list=wordeval.split(sep=" ")

    rate=eval(input("Please rate your experience from 1-10: "))

    if rate < 6:
        print("We're sorry you had a negative", list[0], "and", list[2], "experience!")

    else: 
        print("Okay, thanks for participating. We're glad your experience was", list[1], "!")

    return[wordeval,rate]

main()

1 个答案:

答案 0 :(得分:1)

您正在寻找的是re模块。您可以获得匹配的索引,单个匹配实例等。有一些很好的教程here,您可以查看如何使用该模块,但逐行循环遍历html源代码并查找匹配项很容易,或者你可以在字符串本身找到索引(如果你用换行符拆分它,或者只是把它留作一个长文本字符串)。