在python中包含引号时的正则表达式

时间:2014-02-09 04:21:20

标签: python regex

我有以下行,我直接关闭html页面。我想处理它以从中提取信息:

var quoteDataObj = [{"symbol":".DJIA","symbolType":"symbol","code":0,"name":"Dow Jones Industrial Average","shortName":"DJIA","last":"15794.08","exchange":"Dow Jones Global Indexes","source":"Exchange","open":"15630.64","high":"15798.51","low":"15625.53","change":"165.55","currencyCode":"USD","timeZone":"EST","provider":"CNBC Quote Cache","altSymbol":".DJIA","curmktstatus":"REG_MKT","realTime":"true","assetType":"INDEX","noStreaming":"true","encodedSymbol":".DJIA"}]

我正在使用python来处理我已保存到变量“line”中的字符串。 。我正在尝试编写一个获得

的正则表达式
"low":"15625.53"

但是,我不知道这个号码是什么,所以我不能只搜索它。我尝试过以下但没有运气:

last = re.search(r".*last\":.*\,", line)

谢谢!

3 个答案:

答案 0 :(得分:2)

r'"low":".*?"'应该适合你。

>>> re.search(r'"low":".*?"', text).group()
'"low":"15625.53"'

答案 1 :(得分:2)

cruft,sep,payload = s.partition(' = ')

from ast import literal_eval # or 100% equivalent for this purpose, json.loads

d_in_list = literal_eval(payload)

然后,您只需dict中的常规list

d_in_list[0]
Out[15]: 
{'altSymbol': '.DJIA',
 'assetType': 'INDEX',
 'change': '165.55',
 'code': 0,
 'curmktstatus': 'REG_MKT',
 'currencyCode': 'USD',
 'encodedSymbol': '.DJIA',
 'exchange': 'Dow Jones Global Indexes',
 'high': '15798.51',
 'last': '15794.08',
 'low': '15625.53',
 'name': 'Dow Jones Industrial Average',
 'noStreaming': 'true',
 'open': '15630.64',
 'provider': 'CNBC Quote Cache',
 'realTime': 'true',
 'shortName': 'DJIA',
 'source': 'Exchange',
 'symbol': '.DJIA',
 'symbolType': 'symbol',
 'timeZone': 'EST'}

d_in_list[0]['low']
Out[16]: '15625.53'

虽然确实有99%的可能性存在一个实际的API,您可以在其中提交查询并获取上述json响应,而无需抓取网页并进行简单的解析。

答案 2 :(得分:0)

另一种方法是剥离var ...前缀并将字符串作为JSON对象处理:

>>> import json
>>> data = 'var quoteDataObj = [{"symbol":".DJIA","symbolType":"symbol","code":0,"name":"Dow Jones Industrial Average","shortName":"DJIA","last":"15794.08","exchange":"Dow Jones Global Indexes","source":"Exchange","open":"15630.64","high":"15798.51","low":"15625.53","change":"165.55","currencyCode":"USD","timeZone":"EST","provider":"CNBC Quote Cache","altSymbol":".DJIA","curmktstatus":"REG_MKT","realTime":"true","assetType":"INDEX","noStreaming":"true","encodedSymbol":".DJIA"}]'
>>> json.loads(data[data.find('['):])
[{u'altSymbol': u'.DJIA', u'code': 0, u'last': u'15794.08', u'name': u'Dow Jones Industrial Average', u'noStreaming': u'true', u'exchange': u'Dow Jones Global Indexes', u'assetType': u'INDEX', u'symbol': u'.DJIA', u'realTime': u'true', u'symbolType': u'symbol', u'high': u'15798.51', u'source': u'Exchange', u'encodedSymbol': u'.DJIA', u'low': u'15625.53', u'provider': u'CNBC Quote Cache', u'curmktstatus': u'REG_MKT', u'timeZone': u'EST', u'shortName': u'DJIA', u'open': u'15630.64', u'currencyCode': u'USD', u'change': u'165.55'}]
>>> json_str = data[data.find('['):]  # Take everything from the first [ in the string
>>> json.loads(json_str)[0]['low']
u'15625.53'

您可以从阵列中检索每个属性。