Question

我正在使用正则表达式和re python模块。

我尝试从以下代码中捕获日期：

        <div class="row">
            <div class="small-12 columns">
                <strong>
                    Date:
                </strong>
            December 18th 2015 
            </div>
        </div>
        </div>

我有正则表达式：

(((?!Date:)(?!\n)(.+)(<\/strong\>)(\n)(.+))(\S))

但它仍然可以回复所有：

                </strong>
            December 20th 2016

我想抛弃这个空白而只是得到＆＃34; 2016年12月20日＆＃34;

所以我需要在（（？！Date：）（？！\ n）之后用正则表达式做一些事情，即这个位需要改变：

(.+)(<\/strong\>)(\n)(.+))(\S))

但是根据regexr.com，我不确定我能做什么，不能用。+做负面预测（？！）

2016年12月20日＆＃34;＆＃34;

的任何想法

Answer 1

在

?:

在一些群体的开头意味着他们是非捕获群体，这就是你需要使用它来避免捕获不需要的东西。

然而，正如Daniel Roseman所说，你应该使用HTML解析器

编辑：

from re import findall
s = """        <div class="row">
            <div class="small-12 columns">
                <strong>
                    Date:
                </strong>
            December 18th 2015 
            </div>
        </div>
        </div>"""
res = findall(r'(?:Date:)(?:\n)(?:.+)(?:\n)(?:\s+)(.+)', s)
print(res)

这打印['2015年12月18日']（python 3.5.2）

不包括正则表达式中的确切单词和标签以获取日期

1 个答案: