搜索列表中每个单词的webtext

时间:2017-03-22 20:20:18

标签: python list text-mining

我必须编写代码,从用户选择的网站收集文本,并在此文本中搜索三个选定的单词。然后,它必须输出每个单词及其在网站上出现的次数。

My attempt at writing this program leaves me with an output that tells me that 0 of the words listed are present on the webtext even when I know they do appear. Does anyone have an idea as to how to make it work?
import requests

def main():

    Asentence="This,is,a,sentence,of,some,kind!"
    print(type(Asentence))
    print(Asentence)
    ListOfWords=Asentence.split(",")
    print(type(ListOfWords))
    print(ListOfWords)
    print(ListOfWords[0])
    print(ListOfWords[-1])
    print(ListOfWords[3])

    SomeOtherList=["Sally", "Fred"]
    print(type(SomeOtherList))
    print(SomeOtherList)
    print(SomeOtherList[0])

    for thing in SomeOtherList:
        print(thing)

    n= eval(input("How many websites would you like to enter? :"))
    while n > 0:
        Word()
        n=n-1  



#------------------------------------------   
def Word():   
    answer=input("please enter the websites to examine in the http format ")

    response=requests.get(answer)
    txt = response.text
    print(txt)
    mywords=Firstpart(list)
    num=FindAWord(txt,mywords)
    print("There are", num, "words called",mywords)

#----------------------------------------    
def FindAWord(TheWebText,word):

    print(TheWebText)
    print(type(TheWebText))
    MyList=TheWebText.split(sep=" ")
    print(MyList[0:100])
    count=0
    for item in MyList:
        if(item==word in Firstpart(list)):
            print(item)
            count=count+1

    return count

#----------------------------------

def Firstpart(list):
 wordchoice=[]
 firstword=input("Please enter the first word you would like to look for")
 wordchoice.append(firstword)  
 secondword=input("Please enter the second word you would like to look for")
 wordchoice.append(secondword) 
 thirdword=input("Please enter the third word you would like to look for")
 wordchoice.append(thirdword)
 return wordchoice


main()

Thank you so much in advance. 

2 个答案:

答案 0 :(得分:2)

您可以使用收藏模块中的计数器来帮助您。

import requests
from collections import Counter
def main():
    url = input('Please enter the url to the website you want to search: ')
    if not 'http' in url:
        url = 'http://' + url

    words = []
    for i in range(1,4):
        words.append(input('Please enter word number {}: '.format(i)))

    resp = requests.get(url)
    counter = Counter(resp.text.split())
    for word in words:
        print(word, 'found', counter[word], 'times')


if __name__ == '__main__':
    main()

答案 1 :(得分:0)

Joakim给了an answer这有助于让您的代码更易于阅读和理解,但我会告诉您为什么它首先不起作用的原因。

Word()函数中,变量mywords是用户输入的单词列表。当您将其传递给FindAWord函数时,您将给出一个列表而不是一个单词。然后,当您比较if(item == word)(该行中确实不应该是in FirstPart(list))时,您正在检查单个单词是否等于列表。

您可以通过执行以下操作来修复该部分:

def Word():   
    answer=input("please enter the websites to examine in the http format ")

    response=requests.get(answer)
    txt = response.text
    print(txt)
    mywords=Firstpart(list)
    for word in mywords:
        num=FindAWord(txt,word)
        print("There are", num, "words called",word)

def FindAWord(TheWebText,word):
    print(TheWebText)
    print(type(TheWebText))
    MyList=TheWebText.split(sep=" ")
    print(MyList[0:100])
    count=0
    for item in MyList:
        if(item==word):
            print(item)
            count=count+1
    return count

您应该专注于使您的变量名更具描述性,以帮助您(和其他人)阅读代码更容易理解。正如您所看到的,您在FindAWord a word中将参数命名为单数,并给出了它是一个单词的印象。相反,它是一个单词列表。如果它是users_words或其他什么,你会立即看到if(item == users_words)出现问题。