Question

我想提取＆＃34; 1381912680＆＃34;来自以下代码：

[<abbr class="timestamp" data-utime="1381912680"></abbr>]

使用Python 2.7，这是我目前在代码中进入该阶段的内容：

s = soup.find_all("abbr", { "class" : "timestamp" })
        print s

我应该使用正则表达式还是BS可以独立完成？

修改

我尝试使用正则表达式，但没有运气：

import re

regex = 'data-utime=\"(\d+)\"'
x = re.compile(regex)
x2 = re.findall(x, s)
print x2

我得到了：TypeError：期望的字符串或缓冲区

Answer 1

Python保留类，因此您使用以下格式：

s= soup.find("abbr", class_="timestamp")

但...... <abbr>为空，请使用以上答案：）

Answer 2

您可以使用以下正则表达式在双引号中提取数字

(?<=data-utime=\")[^\"]*

DEMO

Python代码将是，

>>> import re
>>> str = '[<abbr class="timestamp" data-utime="1381912680"></abbr>]'
>>> m = re.findall(r'(?<=data-utime=\")[^\"]*', str)
>>> m
['1381912680']

<强>解释

(?<=data-utime=\")正则表达式引擎在字符串data-utime="
[^\"]*匹配nay字符零次或多次直至文字"

检索CSS选择器的内容

2 个答案: