Python错误“TypeError:强制转换为Unicode:需要字符串或缓冲区,找到列表”

时间:2013-12-21 05:52:38

标签: python python-2.7 unicode

此代码的目的是制作一个程序,用于搜索人名(特别是在维基百科上),并使用关键字来提出该人具有重要意义的原因。 我遇到了这个特定行的问题“如果fact_amount< 5和(sentence.lower()中的术语):”因为我收到此错误(“TypeError:强制转换为Unicode:需要字符串或缓冲区,找到列表”) 如果您能提供一些指导,我们将不胜感激,谢谢。

    import requests
    import nltk
    import re

    #You will need to install requests and nltk
    terms = ['pronounced'
    'was a significant'
    'major/considerable influence'
    'one of the (X) most important'
    'major figure'
    'earliest'
    'known as'
    'father of' 
    'best known for' 
    'was a major']
names = ["Nelson Mandela","Bill Gates","Steve Jobs","Lebron James"]
#List of people that you need to get info from
for name in names:
print name
print '==============='
#Goes to the wikipedia page of the person
r = requests.get('http://en.wikipedia.org/wiki/%s' % (name))
#Parses the raw html into text
raw = nltk.clean_html(r.text)
#Tries to split each sentence.
#sort of buggy though
#For example St. Mary will split after St.
sentences = re.split('[?!.][\s]*',raw)
fact_amount = 0
for sentence in sentences:
    #I noticed that important things came after 'he was' and 'she was'
    #Seems to work for my sample list
    #Also there may be buggy sentences, so I return 5 instead of 3
    if fact_amount < 5 and (terms in sentence.lower()):
        #remove the reference notation that wikipedia has
        #ex [ 33 ]
        sentence = re.sub('[ [0-9]+ ]', '', sentence)
        #removes newlines
        sentence = re.sub('\n', '', sentence)
        #removes trailing and leading whitespace
        sentence = sentence.strip()
         fact_amount += 1
        #sentence is formatted. Print it out
        print sentence + '.'
 print

2 个答案:

答案 0 :(得分:2)

你应该反过来检查

sentence.lower() in terms

terms是列表,sentence.lower()是字符串。您可以检查列表中是否存在特定字符串,但无法检查字符串中是否存在列表。

答案 1 :(得分:2)

您可能需要if any(t in sentence_lower for t in terms),以检查terms列表中的任何字词是否在sentence字符串中。