我正在制作一个正在浏览许多评论的机器人,我想找到任何以" I'm
"开头的句子。或" I am
"。这是一个示例注释(我想要提取两个句子)。
"Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time."
这是我到目前为止的功能。
keywords = ["i'm ","im ","i am "]
def get_quote(comments):
quotes = []
for comment in comments:
isMatch = any(string in comment.text.lower() for string in keywords)
if isMatch:
如何找到句子开始和结束的位置,以便我可以将其添加到列表quotes
?
答案 0 :(得分:6)
您可以使用regular expressions:
>>> import re
>>> text = "Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time."
>>> re.findall(r"(?i)(?:i'm|i am).*?[.?!]", text)
["I'm sorry.",
"I'm sure everyone's day will come, it's just a matter of time."]
我在这里使用的模式是r"(?i)(?:i'm|i am).*?[.?!]"
(?i)
设置标志“忽略大小写”(?:i'm|i am)
“我是”或(|
)“我是”,?:
表示非捕获组.*?
非贪婪(?
)匹配任何字符(*
)的序列(.
)...... [.?!]
...直到找到文字点,问号或感叹号。请注意,这只有在没有“其他”点的情况下才有效,即在“Dr.”中或“先生”,因为这些也将被视为句末。
答案 1 :(得分:2)
检查此代码是否适合您
def get_quote(comments):
keywords = ["i'm ","im ","i am "]
quotes = []
for comment in comments:
isMatch = any(string in comment.lower() for string in keywords)
if isMatch:
quotes.append(comment)
print "Lines having keywords are "
for q in quotes:
print q
if __name__ == "__main__":
a="Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time."
#Removed last "." from line before splitting on basis of "."
a = a.rstrip(".")
list_val = a.split(".")
get_quote(list_val)
输出:
C:\Users\Administrator\Desktop>python demo.py
Lines having keywords are
I'm sorry
I'm sure everyone's day will come, it's just a matter of time
C:\Users\Administrator\Desktop>