Question

我目前正在使用python并使用NLTK来提取数据的功能。我想要提取的一个功能是句子中特定查询词的位置。为此，我尝试了

String.find(word)

但它给了我更多的单词而不是文本中的单词总数。

请建议我找一些方法来找到单词中某个特定单词的位置。

例如＆＃34;今天是我的生日＆＃34; 单词生日的位置是4.如何去做？

Answer 1

string = 'Today is my birthday'
string.find('my') #Out: 9
string[9:] #Out: 'my birthday'

find不会按字词搜索字符串，而是搜索字符。对于简单的示例，您可以这样做（请注意它的零索引）：

words = string.split()
words.index('my') #Out: 2

修改

如果您需要更复杂的单词定义，而不仅仅是用空格分隔的字符串，则可以使用常规表达式。这是一个简单的例子：

import re word_re = re.compile('\w+') words = map(lambda match: match.group(0), word_re.finditer(string)) words.index('my') #Out: 2

<强> EDIT2

try: words.index('earthquake') except ValueError: print 'handle missing word here'

Answer 2

您可以使用re或nltk将文本传输到字符串列表，然后搜索世界：

import re   
text = "Today is my birthday"
word = "birthday"
words1 = re.sub("[^\w]", " ",  text).split() # using re

import nltk
words2 = nltk.word_tokenize(text) # using nltk

position = 1
for str in words1 :# or for str in words2 :
   if str == word:
        print position
   position += 1

查询词的位置

2 个答案: