我有一个字符串text = "Fix me a meeting in 2 days"
。
我列出了一些单词meetingStrings
。
"meet"
中有meetingStrings
。因此,我必须通过开会来拆分文本。
所需的输出:
2天之内
meetingStrings = [
"appointment",
"meet",
"interview"
]
text = "Fix me a meeting in 2 days"
for x in meetingStrings:
if x in text.lower():
txt = text.split(x, 1)[1]
print(txt)
这将提供输出:
在2天之内。
答案 0 :(得分:6)
使用re.split()
:
import re
meetingStrings = [
"appointment",
"meet",
"interview"
]
text = "Fix me a meeting in 2 days"
print(re.split('|'.join(r'(?:\b\w*'+re.escape(w)+r'\w*\b)' for w in meetingStrings), text, 1)[-1])
打印:
in 2 days
答案 1 :(得分:1)
对您的代码稍作更改:
meetingStrings = [
"appointment",
"meet",
"interview"
]
text = "Fix me a meeting in 2 days"
for x in meetingStrings:
if x in text.lower():
txt = text.split(x, 1)[1]
print(txt.split(" ", 1)[1]) #<--- Here
只需获取最终输出,并在第一次出现空格时进行分割
答案 2 :(得分:1)
此表达式也可以与i
标志一起使用:
(?:meet|interview|appointment)\S*\s+((?:in|after)\s[0-9]+\s+(?:days?|months?|weeks?|years?))
,我们可以使用逻辑OR将所有想要的单词包含在非捕获组中,例如:
(?:in|after|on|from)
(?:days?|months?|weeks?|years?|hours?)
(?:meet|interview|appointment|session|schedule)
import re
regex = r"(?:meet|interview|appointment)\S*\s+((?:in|after)\s[0-9]+\s+(?:days?|months?|weeks?|years?))"
test_str = "Fix me a meeting in 2 days meetings in 2 months meet in 1 week nomeeting in 2 days meet after 2 days"
print(re.findall(regex, test_str, re.IGNORECASE))
['in 2 days', 'in 2 months', 'in 1 week', 'in 2 days', 'after 2 days']
如果要浏览/简化/修改该表达式,请在this demo的右上角进行解释。
jex.im可视化正则表达式:
答案 3 :(得分:1)
用于搜索。
您需要做的就是将文字放在单词的中间
然后匹配单词。
结果在捕获组1中。
无空格修饰
\b\w*(?:appointment|meet|interview)\w*\b(.*)
https://regex101.com/r/lK4zRz/1
可读版本
\b
\w*
(?:
appointment
| meet
| interview
)
\w*
\b
( .* ) # (1)
带有空格修剪
(?m)\b\w*(?:appointment|meet|interview)\w*\b[^\S\r\n]*(.*?)[^\S\r\n]*$
https://regex101.com/r/v2qAOQ/1
另外,如果您在任一正则表达式的开头添加.*
,
它将始终获得最后一个关键字。
答案 4 :(得分:0)
尝试一下:
import re
text = "Fix me a meeting in 2 days"
print(re.split("({})\\w*".format("|".join(meetingStrings)), text)[-1].strip())
输出:in 2 days
答案 5 :(得分:0)
没有正则表达式, str.partition
-
for x in meetingStrings:
pre, _, post = text.lower().partition(x)
if post:
pre = pre.rpartition(' ')[0] if not pre.endswith(' ') else pre.rstrip()
post = post.partition(' ')[-1] if not post.startswith(' ') else post.lstrip()
print([pre, post])
示例:
In [35]: meetingStrings = [
...: "appointment",
...: "meet",
...: "interview"
...: ]
...: text = "Fix me a meeting in 2 days"
...: for x in meetingStrings:
...: pre, _, post = text.lower().partition(x)
...: if post:
...: pre = pre.rpartition(' ')[0] if not pre.endswith(' ') else pre.rstrip()
...: post = post.partition(' ')[-1] if not post.startswith(' ') else post.lstrip()
...: print([pre, post])
...:
['fix me a', 'in 2 days']
答案 6 :(得分:0)
尝试这样的事情:
import re
meetingStrings = [
"appointment",
"meet",
"interview"
]
text = "Fix me a meeting in 2 days"
def split_string(text, strings):
search = re.compile('|'.join(strings))
start = None
input = text.split()
for e, x in enumerate(input):
if search.search(x):
if start < e:
yield ' '.join(input[start:e])
start = None
else:
if start is None:
start = e
else:
if start is not None:
yield ' '.join(input[start:])
print(' '.join(split_string(text, meetingStrings)))
这可能比其他答案更长,但似乎确实做到了完全您想要的-拆分字符串,其中包含传入的字符串之一作为子字符串。
答案 7 :(得分:0)
我有另一种更简单的方法,首先将句子中的所有单词分开,然后从出现meetingStrings
的位置处切掉句子:
l=text.split()
for i in meetingStrings:
for idx, j in enumerate(l):
if i in j:
l=l[idx+1:]
print(' '.join(l))
礼物:
'in 2 days'
答案 8 :(得分:0)
您可以只使用find()和列表切片:
text = "Fix me a meeting in 2 days"
meetingStrings = [
"appointment",
"meet",
"interview"
]
sep = [i for i in meetingStrings if i in text]
idx = text.find(sep[0])
idx_ = text[idx:].find(' ')
print (text[idx+idx_:])
输出:
in 2 days