Question

我在制作正则表达式以匹配YAML Front Matter时遇到了麻烦

这是我想要匹配的最重要的事情：

    ---
    name: me
    title: test
    cpu: 1
    ---

这是我认为可行的：

re.search( r'^(---)(.*)(---)$', content, re.MULTILINE)

非常感谢任何帮助。

Answer 1

使用此正则表达式解压缩当前正在执行的操作：

r'^(---)(.*)(---)$'：


r：将此视为string literal in Python

^：在一行的开头开始评估

(---)：将---解析为匿名capture group

(.*)：解析所有字符（.）non-greedily（*）直到下一个表达

(---)：如上所述

$：在对行结束的评估结束

麻烦的是，当空白存在时，这将失败。你真的在说：找到在一行开头出现的破折号并解析，直到我们发现破折号出现在一行的末尾。此外，您通过在用于查找YAML前端问题的破折号周围使用括号()来创建我认为对正则表达式的有用评估不必要的组。

更好的表达方式是：

r'^\s*---(.*)---\s*$'

其中添加了重复组\s*来捕获第一行开头到破折号之间的空格字符，再次将第二组破折号添加到该行的末尾，并捕获所有内容之间的所有内容一个匿名捕获组，您可以将其用于其他处理。如果不需要提取前端内容，只需将(.*)替换为.*即可。

考虑re.findall在单个文件中对此正则表达式进行多次评估，并且如上所述，使用re.DOTALL允许点字符匹配新行。

Answer 2

我使用过像这样的正则表达式re.findall('^---[\s\S]+?---', text)：

def extractFrontMatter(markdown):
    md = open(markdown, 'r')
    text = md.read()
    md.close()
    # Returns first yaml content, `--- yaml frontmatter ---` from the .md file
    # http://regexr.com/3f5la
    # https://stackoverflow.com/questions/2503413/regular-expression-to-stop-at-first-match
    match = re.findall('^---[\s\S]+?---', text)
    if match:
        # Strips `---` to create a valid yaml object
        ymd = match[0].replace('---', '')
        try:
            return yaml.load(ymd)
        except yaml.YAMLError as exc:
            print exc

我也遇到了python-frontmatter，它有一些额外的辅助函数：

import frontmatter
post = frontmatter.load('/path/to-markdown.md')

print post.metadata, 'meta'
print post.keys(), 'keys'

Python Regex与YAML Front Matter相匹配

2 个答案: