Question

我有一堆我需要捕获的行数据：

Level production data TD Index
Total Agriculture\Production data TS Index

我需要在最后两个单词之前捕获所有内容，例如在这种情况下，对于第一个匹配，我的正则表达式输出应该是Level production data。如何在TD Index之前假设不同数量的单词的同时执行此操作。谢谢！

Answer 1

试试这个正则表达式：

^.*(?=(?:\s+\S+){2}$)

Click for Demo

<强>解释

^ - 断言字符串的开头
.* - 匹配除换行符之外的任何字符的出现次数
(?=(?:\s+\S+){2}$) - 确认当前位置后面跟着2个单词（1 +空格后跟1次出现的非空格）X2正好在字符串结尾之前

Answer 2

你可以试试这个：

import re
s = ["Level production data TD Index", "Total Agriculture\Production data TS Index"]
new_s = [re.findall('[\w\s\W]{1,}(?=\s\w+\s\w+$)', i)[0] for i in s]

输出：

['Level production data', 'Total Agriculture\\Production data']

Answer 3

代码

See regex in use here

.*(?= \S+ \S+)

或者：.*(?= [\w\/]+ [\w\/]+)将\S替换为您定义为有效字词集的内容。

如果可能存在多于1个空格，您还可以在空格后添加+：.*(?= +\S+ +\S+)

用法

See code in use here

import re

r = r".*(?= \S+ \S+)"

l = [
    "Level production data TD Index",
    "Total Agriculture\\Production data TS Index"
]

for s in l:
    m = re.match(r, s)
    if m:
        print m.group(0)

说明

.*多次匹配任何字符
(?= \S+ \S+)确定后续匹配的正向前瞻
- 匹配文字空间
- \S+匹配任何非空白字符一次或多次
- 匹配文字空间
- \S+匹配任何非空白字符一次或多次

正则表达式捕获最后2之前的一行中的所有单词

3 个答案:

代码

用法

说明