从请求数据中解析字符串

时间:2014-12-21 00:56:50

标签: python python-requests

我使用python requests获取文件的源代码,然后从源解析字符串。我试图解析的字符串是magic: 8susjdhdyrhsisj3864jsud(并不总是相同的字符串)。如果我通过将它打印到屏幕来观察光源,它就显示了。当我解析字符串有时我会得到一个结果,有时我什么也得不到。请参阅以下屏幕截图:http://i.imgur.com/NW1zFZK.pnghttp://i.imgur.com/cb9e2cb.png。现在我想要的字符串总是出现在源代码中,所以它必须是正则表达式问题?我已经尝试了findallsearch,但这两种方法都给了我相同的结果。结果有时候,有时候我什么也得不到。什么似乎是我的问题?

class Solvemedia():
    def __init__(self, key):
        self.key = key


    def timestamp(self, source):
        timestamp_regex = re.compile(ur'chalstamp:\s+(\d+),')

        print re.findall(timestamp_regex, source)


    def magic(self, source):
        magic_regex = re.compile(ur'magic:\s+\'(\w+)\',')

        print re.findall(magic_regex, source)


    def source(self):
        solvemedia = requests.Session()
        solvemedia.headers.update({
            'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
        })
        source = solvemedia.get('http://api.solvemedia.com/papi/challenge.script?k={}'.format(self.key)).text
        return source


    def test(self):
        js_source = self.source()

        print js_source
        self.magic(js_source)
        self.timestamp(js_source)


solvemedia = Solvemedia('HUaZ-6d2wtQT3-LkLVDPJB5C.E99j9ZK')
solvemedia.test()

1 个答案:

答案 0 :(得分:1)

其中一个值中有.,但\w与点不匹配。比较:

magic: 'AZJEXYx.ZsExcTHvjH9mwQ',
//             ^

使用:

magic: 'xfF9i4YBAQP1EgoNhgEBAw',

更好的选择是允许除引号之外的所有字符:

magic_regex = re.compile(ur"magic:\s+'([^']+)',")

演示:

>>> import re
>>> samples = [
...     u"magic: 'xfF9i4YBAQP1EgoNhgEBAw',",
...     u"magic: 'AZJEXYx.ZsExcTHvjH9mwQ',",
... ]
>>> magic_regex = re.compile(ur"magic:\s+'([^']+)',")
>>> for sample in samples:
...     print magic_regex.search(sample).group(1)
... 
xfF9i4YBAQP1EgoNhgEBAw
AZJEXYx.ZsExcTHvjH9mwQ