我一直在研究开发一种不和谐的机器人,该机器人可以通过阅读消息的内容并检查它们是否出现在列表中来回复消息。
我的问题是,我需要找到一种可靠的方法来使python从文本中查找某些单词,查看它们是否出现在给定列表中并输出检测到的单词。
我设法通过以下代码使其在某种程度上可以工作:
date Year Month Temp Precip Evap
1 2001-01-01 1 1 -2 20 2
2 2001-02-01 1 2 2 10 6
3 2001-03-01 1 3 9 21 9
4 2001-04-01 1 4 10 50 30
5 2001-05-01 1 5 15 35 15
6 2002-01-01 2 1 1 11 1
7 2002-02-01 2 2 -4 10 4
8 2002-03-01 2 3 9 21 9
9 2002-04-01 2 4 13 43 13
10 2002-05-01 2 5 12 60 48
11 2003-01-01 3 1 2 26 4
12 2003-02-01 3 2 4 18 10
13 2003-03-01 3 3 8 40 32
14 2003-04-01 3 4 14 60 70
15 2003-05-01 3 5 16 46 40
我真的很感谢您。
答案 0 :(得分:2)
这里有一些代码可以执行您所描述的内容。但是,实际上,听起来好像需要花费大量时间来完成一些基本的Python教程,然后才能实现这一点。
import re
key_words = set(['foo', 'bar', 'baz'])
typed_str = 'You are such a Foo BAR!'
print key_words & set(re.findall('[a-z]+', typed_str.lower()))
答案 1 :(得分:1)
我不确定要问的是什么,但是如果您要构建一个吸收原始用户输入的机器人,则需要考虑(没有特定顺序)。
如果您的环境允许访问库,则可以考虑签出TextBlob。以下命令将为您提供以下示例所需的功能。
pip install textblob
python -m textblob.download_corpora
from textblob import TextBlob, Word
import copy
def score_intent(rawstring,keywords,weights=None,threshold=0.01,debug=False):
"""
rawstring: string of text with words that you want to detect
keywords: list of words that you are looking for
weights: (optional) dictionary with relative weights of words you want
threshold: spellcheck confidence threshold
debug: boolean for extra print statements to help debug
"""
allwords = TextBlob(rawstring).words
allwords = [w.upper() for w in allwords]
keywords = [k.upper() for k in keywords]
processed_input_as_list = spellcheck_subject_matter_specific(rawstring,keywords,threshold=threshold,debug=debug)
common_words = intersection(processed_input_as_list,keywords)
intent_score = len(common_words)
if weights:
for special_word in weights.keys():
if special_word.upper() in common_words:
# the minus one is so we dont double count a word.
intent_score = intent_score + weights[special_word] -1
if debug:
print "intent score: %s" %intent_score
print "words of interest found in text: {}".format(common_words)
# you could return common_words and score intent based on the list.
# return common_words, intent_score
return common_words
def intersection(a,b):
"""
a and b are lists
function returns a list that is the intersection of the two
"""
return list(set(a)&set(b))
def spellcheck_subject_matter_specific(rawinput,subject_matter_vector,threshold=0.01,capitalize=True,debug=False):
"""
rawinput: all the text that you want to check for spelling
subject_matter_vector: only the words that are worth spellchecking for (since the function can be sort of sensitive it might correct words that you don't want to correct)
threshold: the spell check confidence needed to update the word to the correct spelling
capitalize: boolean determining if you want the return string to be capitalized.
"""
new_input = copy.copy(rawinput)
for w in TextBlob(rawinput).words:
spellchecked_vec = w.spellcheck()
if debug:
print "Word: %s" %w
print "Spellchecked Guesses & Confidences: %s" %spellchecked_vec
print "Only spellchecked confidences greater than {} and in this list {} will be included".format(threshold,subject_matter_vector)
corrected_words = [z[0].upper() for z in spellchecked_vec if z[1] > threshold]
important_words = intersection(corrected_words,subject_matter_vector)
for new_word in important_words:
new_input = new_input + ' ' + new_word
inputBlob = TextBlob(new_input)
processed_input = inputBlob.words
if capitalize:
processed_input = [word.upper() for word in processed_input]
return processed_input
discord_str = "Hi, i want to talk about codee and pYtHon"
words2detect = ["python","code"]
score_intent(rawstring=discord_str,keywords=words2detect,threshold=0.01,debug=True)
intent score: 2 words of interest found in text: ['PYTHON', 'CODE']