Question

我尝试在python中使用正则表达式使用特定模式来解析.md文件。该文件是这样写的：

## title
## title 2

### first paragraph
[lines]
...

### second
[lines]
...

## third 
[lines]
...

## last
[lines]
...

所以我用了这个正则表达式来匹配它：

##(.*)\n+##(.*)\n+###((\n|.)*)###((\n|.)*)##((\n|.)*)##((\n|.)*)

当我在线尝试时，正则表达式匹配： https://regex101.com/r/8iYBrp/1

但是当我在python中使用它时，它不起作用，我不明白为什么。

这是我的代码：

import re

str = (
    r'##(.*)\n+##(.*)\n+###((\n|.)*)###((\n|.)*)##((\n|.)*)##((\n|.)*)')
file_regexp = re.compile(str)

## Retrieve the content of the file (I am sure this part 
## returns what I want)

m = file_regexp.match(fileContent)

# m is always None

我已经尝试添加标志，例如re.DOTALL，re.I，re.M，re.S.。但是，当我这样做时，脚本变得非常缓慢，并且计算机开始发出奇怪的声音。

有人知道我做错了吗？任何帮助表示赞赏

Answer 1

首先，您将正则表达式模式分配给变量str（覆盖内置str），但此后使用featureStr。结果匹配对象为空，因为您告诉它忽略匹配的对象。您可以使用?P<name>为正则表达式占位符分配名称，并在以后访问它们。这是一个工作示例：

import re

featureStr = (
    r'##(?P<title>.*)\n+##(?P<title_2>.*)\n+###(?P<first>(.*)###(?P<second>(.*)##(?P<third>(.*)##(.*)')
file_regexp = re.compile(featureStr, re.S)

fileContent = open("markdown.md").read()

m = file_regexp.match(fileContent)

print(m.groupdict())

哪些印刷品：

{'title': ' title', 'title_2': ' title 2', 'first': ' first paragraph\n[lines]\n...\n\n', 'second': ' second\n[lines]\n...\n\n', 'third': ' third \n[lines]\n...\n\n'}

希望这对您有所帮助。让我知道是否还有任何疑问。祝你有美好的一天！

Answer 2

如果我错了，请纠正我，但是如果您仅对行感兴趣，则可以跳过以＃开头的行。可以通过类似

的方法解决

os.path.join()

为什么需要正则表达式？

Answer 3

使用@ViewChild("player") player: ElementRef; constructor(private page: Page) { } ngOnInit() { this.router.paramMap.subscribe( (response) => { this.index = response.get('id') this.videoUrl=this.galleryService.getVideoById(this.index) console.log(response) }); this.page.on(Page.navigatingFromEvent, () => { this.player.nativeElement.pause(); }); } ngOnDestroy() { this.page.off(Page.navigatingFromEvent); }代替re.search。

re.match

输出：

str = (r'##(.*?)\n##(.*?)\n+###(.*?)\n+###(.*?)\n+##(.*?)\n+##(.*?)')
file_regexp = re.compile(str, re.S)

fileContent = '''
## title
## title 2

### first paragraph
[lines]
...

### second
[lines]
...

## third 
[lines]
...

## last
[lines]
...
'''

m = file_regexp.search(fileContent)
print(m.groups())

正则表达式匹配，但re.match（）不返回任何内容

3 个答案: