我正在编写一个代码,用于打开链接并将子字符串j周围的单词收集到Res中,然后按如下方式收集Res中的所有名词:
j="Green Index" #defining word to be looked for
sub = '(\w*)\W*(\w*)\W*(%s)\W*(\w*)\W*(\w*)' % j #defining substring including word
allnouns=[]
link="http://greenindex.timberland.com/" #defining link to search for word
f=requests.get(link)
str1=f.text
for i in re.findall(sub, str1, re.I): #collecting all terms found together
print(" ".join([x for x in i if x != ""]))
Res=(" ".join([x for x in i if x != ""]))#creating each sentence Res
Results.append(Res) #putting all sentences Res in one list Results
sentences = nltk.sent_tokenize(Res) #here is where I hit an error
nouns = []
for sentence in sentences:
for word,pos in nltk.pos_tag(nltk.word_tokenize(str(sentence))):
if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos ==
'NNPS'):
nouns.append(word)
allnouns.append(nouns)
我在第二次循环之前遇到错误:
TypeError: Can't convert 'list' object to str implicitly
我检查了type(Res)=class str
,我试图拆分Res也认为它可能会有所帮助,sentences = nltk.sent_tokenize(Res.split)
但同样的错误。我怎么能绕过它?