我有以下行,我直接关闭html页面。我想处理它以从中提取信息:
var quoteDataObj = [{"symbol":".DJIA","symbolType":"symbol","code":0,"name":"Dow Jones Industrial Average","shortName":"DJIA","last":"15794.08","exchange":"Dow Jones Global Indexes","source":"Exchange","open":"15630.64","high":"15798.51","low":"15625.53","change":"165.55","currencyCode":"USD","timeZone":"EST","provider":"CNBC Quote Cache","altSymbol":".DJIA","curmktstatus":"REG_MKT","realTime":"true","assetType":"INDEX","noStreaming":"true","encodedSymbol":".DJIA"}]
我正在使用python来处理我已保存到变量“line”中的字符串。 。我正在尝试编写一个获得
的正则表达式"low":"15625.53"
但是,我不知道这个号码是什么,所以我不能只搜索它。我尝试过以下但没有运气:
last = re.search(r".*last\":.*\,", line)
谢谢!
答案 0 :(得分:2)
r'"low":".*?"'
应该适合你。
>>> re.search(r'"low":".*?"', text).group()
'"low":"15625.53"'
答案 1 :(得分:2)
cruft,sep,payload = s.partition(' = ')
from ast import literal_eval # or 100% equivalent for this purpose, json.loads
d_in_list = literal_eval(payload)
然后,您只需dict
中的常规list
。
d_in_list[0]
Out[15]:
{'altSymbol': '.DJIA',
'assetType': 'INDEX',
'change': '165.55',
'code': 0,
'curmktstatus': 'REG_MKT',
'currencyCode': 'USD',
'encodedSymbol': '.DJIA',
'exchange': 'Dow Jones Global Indexes',
'high': '15798.51',
'last': '15794.08',
'low': '15625.53',
'name': 'Dow Jones Industrial Average',
'noStreaming': 'true',
'open': '15630.64',
'provider': 'CNBC Quote Cache',
'realTime': 'true',
'shortName': 'DJIA',
'source': 'Exchange',
'symbol': '.DJIA',
'symbolType': 'symbol',
'timeZone': 'EST'}
d_in_list[0]['low']
Out[16]: '15625.53'
虽然确实有99%的可能性存在一个实际的API,您可以在其中提交查询并获取上述json
响应,而无需抓取网页并进行简单的解析。
答案 2 :(得分:0)
另一种方法是剥离var ...
前缀并将字符串作为JSON对象处理:
>>> import json
>>> data = 'var quoteDataObj = [{"symbol":".DJIA","symbolType":"symbol","code":0,"name":"Dow Jones Industrial Average","shortName":"DJIA","last":"15794.08","exchange":"Dow Jones Global Indexes","source":"Exchange","open":"15630.64","high":"15798.51","low":"15625.53","change":"165.55","currencyCode":"USD","timeZone":"EST","provider":"CNBC Quote Cache","altSymbol":".DJIA","curmktstatus":"REG_MKT","realTime":"true","assetType":"INDEX","noStreaming":"true","encodedSymbol":".DJIA"}]'
>>> json.loads(data[data.find('['):])
[{u'altSymbol': u'.DJIA', u'code': 0, u'last': u'15794.08', u'name': u'Dow Jones Industrial Average', u'noStreaming': u'true', u'exchange': u'Dow Jones Global Indexes', u'assetType': u'INDEX', u'symbol': u'.DJIA', u'realTime': u'true', u'symbolType': u'symbol', u'high': u'15798.51', u'source': u'Exchange', u'encodedSymbol': u'.DJIA', u'low': u'15625.53', u'provider': u'CNBC Quote Cache', u'curmktstatus': u'REG_MKT', u'timeZone': u'EST', u'shortName': u'DJIA', u'open': u'15630.64', u'currencyCode': u'USD', u'change': u'165.55'}]
>>> json_str = data[data.find('['):] # Take everything from the first [ in the string
>>> json.loads(json_str)[0]['low']
u'15625.53'
您可以从阵列中检索每个属性。