Question

我正在制作一个正在浏览许多评论的机器人，我想找到任何以＆＃34; I'm＆＃34;开头的句子。或＆＃34; I am＆＃34;。这是一个示例注释（我想要提取两个句子）。

"Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time."

这是我到目前为止的功能。

keywords = ["i'm ","im ","i am "]

def get_quote(comments):
    quotes = []
    for comment in comments:
        isMatch = any(string in comment.text.lower() for string in keywords)
        if isMatch:

如何找到句子开始和结束的位置，以便我可以将其添加到列表quotes？

Answer 1

您可以使用regular expressions：

>>> import re
>>> text = "Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time." 
>>> re.findall(r"(?i)(?:i'm|i am).*?[.?!]", text)
["I'm sorry.",
 "I'm sure everyone's day will come, it's just a matter of time."]

我在这里使用的模式是r"(?i)(?:i'm|i am).*?[.?!]"

(?i)设置标志“忽略大小写”
(?:i'm|i am)“我是”或（|）“我是”，?:表示非捕获组
.*?非贪婪（?）匹配任何字符（*）的序列（.）......
[.?!] ...直到找到文字点，问号或感叹号。

请注意，这只有在没有“其他”点的情况下才有效，即在“Dr.”中或“先生”，因为这些也将被视为句末。

Answer 2

检查此代码是否适合您

def get_quote(comments):
    keywords = ["i'm ","im ","i am "]
    quotes = []
    for comment in comments:
        isMatch = any(string in comment.lower() for string in keywords)
        if isMatch:
            quotes.append(comment)
    print "Lines having keywords are "
    for q in quotes:
        print q


if __name__ == "__main__":
    a="Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time."
    #Removed last "." from line before splitting on basis of "."
    a = a.rstrip(".")
    list_val = a.split(".")
    get_quote(list_val)

输出：

C:\Users\Administrator\Desktop>python demo.py
Lines having keywords are
 I'm sorry
 I'm sure everyone's day will come, it's just a matter of time

C:\Users\Administrator\Desktop>

从python中的文本中找到并提取包含关键字的字符串

2 个答案: