我使用python requests获取文件的源代码,然后从源解析字符串。我试图解析的字符串是magic: 8susjdhdyrhsisj3864jsud
(并不总是相同的字符串)。如果我通过将它打印到屏幕来观察光源,它就显示了。当我解析字符串有时我会得到一个结果,有时我什么也得不到。请参阅以下屏幕截图:http://i.imgur.com/NW1zFZK.png,http://i.imgur.com/cb9e2cb.png。现在我想要的字符串总是出现在源代码中,所以它必须是正则表达式问题?我已经尝试了findall
和search
,但这两种方法都给了我相同的结果。结果有时候,有时候我什么也得不到。什么似乎是我的问题?
class Solvemedia():
def __init__(self, key):
self.key = key
def timestamp(self, source):
timestamp_regex = re.compile(ur'chalstamp:\s+(\d+),')
print re.findall(timestamp_regex, source)
def magic(self, source):
magic_regex = re.compile(ur'magic:\s+\'(\w+)\',')
print re.findall(magic_regex, source)
def source(self):
solvemedia = requests.Session()
solvemedia.headers.update({
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
})
source = solvemedia.get('http://api.solvemedia.com/papi/challenge.script?k={}'.format(self.key)).text
return source
def test(self):
js_source = self.source()
print js_source
self.magic(js_source)
self.timestamp(js_source)
solvemedia = Solvemedia('HUaZ-6d2wtQT3-LkLVDPJB5C.E99j9ZK')
solvemedia.test()
答案 0 :(得分:1)
其中一个值中有.
,但\w
与点不匹配。比较:
magic: 'AZJEXYx.ZsExcTHvjH9mwQ',
// ^
使用:
magic: 'xfF9i4YBAQP1EgoNhgEBAw',
更好的选择是允许除引号之外的所有字符:
magic_regex = re.compile(ur"magic:\s+'([^']+)',")
演示:
>>> import re
>>> samples = [
... u"magic: 'xfF9i4YBAQP1EgoNhgEBAw',",
... u"magic: 'AZJEXYx.ZsExcTHvjH9mwQ',",
... ]
>>> magic_regex = re.compile(ur"magic:\s+'([^']+)',")
>>> for sample in samples:
... print magic_regex.search(sample).group(1)
...
xfF9i4YBAQP1EgoNhgEBAw
AZJEXYx.ZsExcTHvjH9mwQ