从python中的文本中找到并提取包含关键字的字符串

时间:2015-08-28 07:38:50

标签: python

我正在制作一个正在浏览许多评论的机器人,我想找到任何以" I'm"开头的句子。或" I am"。这是一个示例注释(我想要提取两个句子)。

"Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time."  

这是我到目前为止的功能。

keywords = ["i'm ","im ","i am "]

def get_quote(comments):
    quotes = []
    for comment in comments:
        isMatch = any(string in comment.text.lower() for string in keywords)
        if isMatch:

如何找到句子开始和结束的位置,以便我可以将其添加到列表quotes

2 个答案:

答案 0 :(得分:6)

您可以使用regular expressions

>>> import re
>>> text = "Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time." 
>>> re.findall(r"(?i)(?:i'm|i am).*?[.?!]", text)
["I'm sorry.",
 "I'm sure everyone's day will come, it's just a matter of time."]

我在这里使用的模式是r"(?i)(?:i'm|i am).*?[.?!]"

  • (?i)设置标志“忽略大小写”
  • (?:i'm|i am)“我是”或(|)“我是”,?:表示非捕获组
  • .*?非贪婪(?)匹配任何字符(*)的序列(.)......
  • [.?!] ...直到找到文字点,问号或感叹号。

请注意,这只有在没有“其他”点的情况下才有效,即在“Dr.”中或“先生”,因为这些也将被视为句末。

答案 1 :(得分:2)

检查此代码是否适合您

def get_quote(comments):
    keywords = ["i'm ","im ","i am "]
    quotes = []
    for comment in comments:
        isMatch = any(string in comment.lower() for string in keywords)
        if isMatch:
            quotes.append(comment)
    print "Lines having keywords are "
    for q in quotes:
        print q


if __name__ == "__main__":
    a="Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time."
    #Removed last "." from line before splitting on basis of "."
    a = a.rstrip(".")
    list_val = a.split(".")
    get_quote(list_val)

输出:

C:\Users\Administrator\Desktop>python demo.py
Lines having keywords are
 I'm sorry
 I'm sure everyone's day will come, it's just a matter of time

C:\Users\Administrator\Desktop>