Python将输出转换为句子

时间:2014-11-17 21:06:43

标签: python python-2.7 nlp nltk

我刚开始学习python。我试图通过闯入单词和加入句子来清理句子。文件big.txt有一些像青年,看护人等的话。问题在于最后的程序:looper,这会产生每行的输出。

正确是在此代码之前定义的另一个过程,用于纠正每个单词

这是代码:

zebra = 'Yout caretak taking care of something'

count = len(re.findall(r'\w+', zebra))

def looper(a,count):
words = nltk.word_tokenize(zebra)
for i in range(len(words)):
    X = correct(words[i])
    print (X)

final = looper(zebra)

它产生的输出:

youth
caretaker
walking
car
in
something

我应该如何处理所有单独的输出并作出一个句子:

预期结果:

youth caretaker walking car in something

如果您需要其他详细信息,请与我们联系。

提前致谢

3 个答案:

答案 0 :(得分:1)

使用列表理解:

print " ".join([ correct(words[i]) for i in range(len(words)) ])

它应该是这样的:

zebra = 'Yout caretak taking care of something'

count = len(re.findall(r'\w+', zebra))
words = nltk.word_tokenize(zebra)
def looper(a,count):
    print " ".join([ correct(words[i]) for i in range(len(words)) ])

单词应该不在函数中,每次循环时都不需要得到单词。

您也可以使用它:

print " ".join([ correct(i) for i in words ])

这是正确的方法:

zebra = 'Yout caretak taking care of something'
words = nltk.word_tokenize(zebra)
print " ".join([ correct(i) for i in words ])

你不需要这里的功能,因为单词是单词列表,你可以迭代和加入。

你的代码中的

zebra = 'Yout caretak taking care of something'
words = nltk.word_tokenize(zebra)
for x in words:
    print correct(x),

演示:

>>> zebra = 'Yout caretak taking care of something'
>>> words = nltk.word_tokenize(zebra)
>>> words
['Yout', 'caretak', 'taking', 'care', 'of', 'something']

正如您所看到的,nltk.word_tokenize为您提供了单词列表,因此您可以轻松地遍历它们,

答案 1 :(得分:1)

>>> import nltk
>>> zebra = 'Yout caretak taking care of something'
>>> for word in nltk.word_tokenize(zebra):
...     print word
... 
Yout
caretak
taking
care
of
something

然后$ sudo pip install pyenchant(请参阅https://pythonhosted.org/pyenchant/api/enchant.html)和:

>>> import nltk
>>> import enchant
>>> zebra = 'Yout caretak taking care of something'
>>> dictionary = enchant.Dict('en_US')
>>> for word in nltk.word_tokenize(zebra):
...     dictionary.suggest(word)
... 
['Out', 'Yost', 'Rout', 'Tout', 'Lout', 'Gout', 'Pout', 'Bout', 'Y out', 'Your', 'You', 'Youth', 'Yous', 'You t']
['caretaker', 'caret', 'Clareta', 'cabaret', 'curettage', 'critical']
['raking', 'takings', 'tasking', 'staking', 'tanking', 'talking', 'tacking', 'taring', 'toking', 'laking', 'caking', 'taming', 'making', 'taping', 'baking']
['CARE', 'acre', 'acer', 'race', 'Care', 'car', 'are', 'cares', 'scare', 'carer', 'caret', 'carte', 'cared', 'cadre', 'carve']
['if', 'pf', 'o', 'f', 'oaf', 'oft', 'off', 'sf', 'on', 'or', 'cf', 'om', 'op', 'oh', 'hf']
['somethings', 'some thing', 'some-thing', 'something', 'locksmithing', 'smoothness']

然后尝试:

>>> for word in nltk.word_tokenize(zebra):
...     print [i for i in dictionary.suggest(word) if word in i]
... 
['Youth']
['caretaker']
['takings', 'staking']
['cares', 'scare', 'carer', 'caret', 'cared']
['oft', 'off']
['somethings', 'something']

所以:

>>> " ".join([[word if dictionary.check(word) else i for i in dictionary.suggest(word) if word in i][0] for word in nltk.word_tokenize(zebra)])
'Youth caretaker taking care of something'

答案 2 :(得分:0)

zebra = 'Yout caretak taking care of something'

count = len(re.findall(r'\w+', zebra))

def looper(a,count):
words = nltk.word_tokenize(zebra)
for i in range(len(words)):
    X = correct(words[i])
    print X,    
final = looper(zebra)

在X ---> print X,

之后添加