Question

这是我的regex

我想要做的是能够捕获表格和页码。示例输出或我想要的是下面。我想要的表部分显然是显而易见的。页码为10 10 4 Text Core statistics aggregated by the Statistics（第一个数字）和12 4 Text Core statistics aggregated by the Statistics 12（最后一个数字）。

在np ++中，我可以使用Table \d+获取所有表格但是我也想要同一页面底部的页码。

我有什么：

Table 1: bifrost

<lots of randon text >

10 4 Text Core statistics aggregated by the Statistics 

<lots of randon text >

4 Text Core statistics aggregated by the Statistics 11

Table 2: homestead

<lots of randon text >

4 Text Core statistics aggregated by the Statistics 12

<lots of randon text >

12 4 Text Core statistics aggregated by the Statistics 


Table 3: homestead

<lots of randon text >

12 4 Text Core statistics aggregated by the Statistics

我想要的是什么：

Table 1: bifrost
10 4 Text Core statistics aggregated by the Statistics 
Table 2: homestead
4 Text Core statistics aggregated by the Statistics 12
Table 3: homestead
12 4 Text Core statistics aggregated by the Statistics

EDIT1

关于以下可能的答案，如果这有帮助：

(Table \d*).*?(?=\d+\s(\d+\s)?Text Core)([^\n]+)(.*?(?=^Table \d+|\z)) - 找不到任何内容 (Table \d*).* - 作品找到Table行
(Table \d*) - 工作人员找到该行的Table和数字部分（例如Table 1）

.*?(?=\d+\s(\d+\s)?Text Core) - 作品在以数字开头的行的开头找到数字（^零长度匹配）
(?=\d+\s(\d+\s)?Text Core) - 作品在以数字开头的行的开头找到数字（^零长度匹配）
([^\n]+) - 作品找到带有文字的线条（即突出显示所有文字）
(.*?(?=^Table \d+|\z)) - 工作时会在开头找到Table的行的开头。

Answer 1

我可以提供至少部分解决方案。替换以下模式：

^(?!Table)(?!\d+ (?:\d+ )?Text Core).*$

并用空字符串替换它。这应该删除以Table开头或包含Text Core的行之间的所有随机文本。这是一个有效的演示：

Demo

Answer 2

编辑实际下载了notepad ++并测试了正则表达式。

这将有效：

(^Table \d+).*?(?=\d+\s(\d+\s)?Text Core)([^\n]+)(.*?(?=^Table \d+|\z))

它使用正向前瞻来搜索表号后的第一个页码，然后从那里到行尾抓取所有内容。然后它抓住一切直到下一个'表'。请注意，您需要检查. matches newline框。

如果您想要替换，请将其替换为\1\n\3\n。 Demo on regex101.com

regex + np ++ +在页面顶部捕获字符串，从页面底部捕获字符串

EDIT1

2 个答案:

Demo