从文本中提取特定部分 - Python

时间:2016-06-18 15:58:23

标签: python text split detect

我想提取以

为例开始的部分文本

“你好”并以“再见”结束

示例:

从:

中提取句子Hello i'm Gabi, :D goodbye
asdasd dwref ADSADSADA Hello i'm Gabi :D goodbye asd asl sodjasdji asdoija

2 个答案:

答案 0 :(得分:1)

您可以使用非常基本的正则表达式:

(关于它如何运作的演示和解释:https://regex101.com/r/bO0rL7/2

import re

string = "asdasd dwref ADSADSADA Hello i'm Gabi :D goodbye asd asl sodjasdji asdoija"


match = re.findall(r'hello .+ goodbye', string, flags=re.IGNORECASE)
if match:
    print(match[0])
>> "Hello i'm Gabi :D goodbye"

答案 1 :(得分:0)

除非你想实现NLP,并且不熟悉正则表达式,否则一个简单的方法如下:

import sys
s = "asdasd dwref ADSADSADA Hello i'm Gabi :D goodbye asd asl sodjasdji asdoija"
hello = s.find("Hello")
goodbye = s.find("goodbye")
if hello == -1 or goodbye == -1:
    print("Not found")
    sys.exit(0)
goodbye += len("goodbye") 
print(s[hello:goodbye])