获取python查找单词并将其输出

时间:2018-07-18 21:52:27

标签: python

我一直在研究开发一种不和谐的机器人,该机器人可以通过阅读消息的内容并检查它们是否出现在列表中来回复消息。

我的问题是,我需要找到一种可靠的方法来使python从文本中查找某些单词,查看它们是否出现在给定列表中并输出检测到的单词。

我设法通过以下代码使其在某种程度上可以工作:

         date Year Month Temp Precip Evap
1  2001-01-01    1     1   -2     20    2
2  2001-02-01    1     2    2     10    6
3  2001-03-01    1     3    9     21    9
4  2001-04-01    1     4   10     50   30
5  2001-05-01    1     5   15     35   15
6  2002-01-01    2     1    1     11    1
7  2002-02-01    2     2   -4     10    4
8  2002-03-01    2     3    9     21    9
9  2002-04-01    2     4   13     43   13
10 2002-05-01    2     5   12     60   48
11 2003-01-01    3     1    2     26    4
12 2003-02-01    3     2    4     18   10
13 2003-03-01    3     3    8     40   32
14 2003-04-01    3     4   14     60   70
15 2003-05-01    3     5   16     46   40

我真的很感谢您。

2 个答案:

答案 0 :(得分:2)

这里有一些代码可以执行您所描述的内容。但是,实际上,听起来好像需要花费大量时间来完成一些基本的Python教程,然后才能实现这一点。

import re

key_words = set(['foo', 'bar', 'baz'])

typed_str = 'You are such a Foo BAR!'

print key_words & set(re.findall('[a-z]+', typed_str.lower()))

答案 1 :(得分:1)

我不确定要问的是什么,但是如果您要构建一个吸收原始用户输入的机器人,则需要考虑(没有特定顺序)。

  1. 大写敏感性
  2. 拼写检查
  3. 简单地理解意图

如果您的环境允许访问库,则可以考虑签出TextBlob。以下命令将为您提供以下示例所需的功能。

pip install textblob

python -m textblob.download_corpora

核心功能

from textblob import TextBlob, Word
import copy

def score_intent(rawstring,keywords,weights=None,threshold=0.01,debug=False):
    """
    rawstring: string of text with words that you want to detect
    keywords: list of words that you are looking for
    weights: (optional) dictionary with relative weights of words you want
    threshold: spellcheck confidence threshold
    debug: boolean for extra print statements to help debug
    """
    allwords = TextBlob(rawstring).words
    allwords = [w.upper() for w in allwords]
    keywords = [k.upper() for k in keywords]
    processed_input_as_list = spellcheck_subject_matter_specific(rawstring,keywords,threshold=threshold,debug=debug)
    common_words = intersection(processed_input_as_list,keywords)
    intent_score = len(common_words)
    if weights:
        for special_word in weights.keys():
            if special_word.upper() in common_words:
                # the minus one is so we dont double count a word.
                intent_score = intent_score + weights[special_word] -1 

    if debug:
        print "intent score: %s" %intent_score
        print "words of interest found in text: {}".format(common_words)
    # you could return common_words and score intent based on the list.
    # return common_words, intent_score
    return common_words

相交和拼写检查实用程序

def intersection(a,b):
    """
    a and b are lists
    function returns a list that is the intersection of the two
    """
    return list(set(a)&set(b))



def spellcheck_subject_matter_specific(rawinput,subject_matter_vector,threshold=0.01,capitalize=True,debug=False):
    """
    rawinput: all the text that you want to check for spelling
    subject_matter_vector: only the words that are worth spellchecking for (since the function can be sort of sensitive it might correct words that you don't want to correct)
    threshold: the spell check confidence needed to update the word to the correct spelling
    capitalize: boolean determining if you want the return string to be capitalized.
    """

    new_input = copy.copy(rawinput)

    for w in TextBlob(rawinput).words:
        spellchecked_vec = w.spellcheck()
        if debug:
            print "Word: %s" %w
            print "Spellchecked Guesses & Confidences: %s" %spellchecked_vec
            print "Only spellchecked confidences greater than {} and in this list {} will be included".format(threshold,subject_matter_vector)

        corrected_words = [z[0].upper() for z in spellchecked_vec if z[1] > threshold] 
        important_words = intersection(corrected_words,subject_matter_vector)
        for new_word in important_words:

            new_input = new_input + ' ' + new_word


    inputBlob = TextBlob(new_input)
    processed_input = inputBlob.words
    if capitalize:
        processed_input = [word.upper() for word in processed_input]

    return processed_input

用法示例

discord_str = "Hi, i want to talk about codee and pYtHon"

words2detect = ["python","code"]

score_intent(rawstring=discord_str,keywords=words2detect,threshold=0.01,debug=True)

输出

intent score: 2
words of interest found in text: ['PYTHON', 'CODE']