我想转换list
中的以下tuple
:
[('1599324732926-0',
{'data': '{"timestamp":1599324732.767,
"receipt_timestamp":1599324732.9256856,
"delta":true,
"bid":{"338.9":0.06482,"338.67":3.95535},
"ask":{"339.12":2.47578,"339.13":6.43172}
}'
}
)
('1599324732926-1',
{'data': '{"timestamp":1599324832.767,
"receipt_timestamp":1599324832.9256856,
"delta":true,
"bid":{"338.8":0.06482,"338.57":3.95535},
"ask":{"340.12":2.47578,"340.13":6.43172}
}'
}
)
]
放入list
个dict
或一个数据帧(无论是哪种,从一个到另一个都不复杂):
[{
'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 338.9,
'size': 0.06482},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 338.67,
'size': 3.95535},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 338.66,
'size': 16.78636},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 338.63,
'size': 2.5},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 338.45,
'size': 6.06071},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 338.38,
'size': 0.0},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 338.95,
'size': 0.0},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 338.96,
'size': 0.0},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 339.11,
'size': 0.0},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 339.12,
'size': 2.47578},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 339.13,
'size': 6.43172},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 339.36,
'size': 0.0},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 339.52,
'size': 6.5},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 341.18,
'size': 0.0},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 341.19,
'size': 0.0},
...
]
基本上,
data
中的数据是具有嵌套字典的JSON对象。我能够分别处理列表中的每个JSON元素。 但是列表最多可以包含60万个元素。 我问是否可以使用一些熊猫或numpy函数将列表作为一个整体进行处理,以使其变得更快?
我看了熊猫json_normalize()
,但根据给出的示例,字典的键是系统地列,而在这种情况下,“价格”键成为“价格”列的值。
您知道我该怎么做吗?有什么方法可以1st预处理JSON列表,以便可以用json_normalize()
对其进行进一步处理。
仅作参考,以下是我可以编写以分别处理列表中每个元素的代码,但我认为这不是正确的方向。下一步是将其封装在for循环中,这与管理整个列表的解决方案相比要慢得多。
import json
data_light = ('1599324732926-0',
{'data': '{"timestamp":1599324732.767, \
"receipt_timestamp":1599324732.9256856,\
"delta":true, \
"bid":{"338.9":0.06482,"338.67":3.95535}, \
"ask":{"339.12":2.47578,"339.13":6.43172} \
}'
}
)
var=json.loads(data_light[1]['data'])
var_bid=var['bid']
var_ask=var['ask']
mylist=list(var_bid.items())+list(var_ask.items())
it = ['ask'] * len(var_ask) + ['bid'] * len(var_bid)
timestamp=var['timestamp']
receipt_timestamp=var['receipt_timestamp']
delta=var['delta']
midx = pd.MultiIndex.from_product([[timestamp], [receipt_timestamp], [delta],it], names=['timestamp', 'receipt_timestamp', 'delta', 'side'])
df=pd.DataFrame(mylist, index=midx, columns=['price', 'size'], dtype=float)
my_dict=df.reset_index().to_dict('records')
答案 0 :(得分:1)
这不是您的问题的确切答案,因为它不是熊猫或numpy的实现,但我认为它应该可以满足您的需求。
尝试看看multiprocessing.pool.Pool.map
假设您有一个函数可以从原始列表中接收元组并返回所需的数据字典。可以说它的签名如下:
def tuple_to_dict(input):
# conversion code goes here
return result_dict
然后您可以像这样使用multiprocessing.Pool():
import multiprocessing
if __name__ == '__main__':
input_list = [...] # your input list
with multiprocessing.Pool() as pool:
result_list = pool.map(tuple_to_dict, input_list)
print(result_list)
注意:
应该将Pool()对象的创建放置在if __name__ == "__main__"
块或从那里调用的函数(recursivley)中,否则会出现RuntimeError
with ... as...
放在此处,以便在使用结束或失败时关闭Pool对象。如果您不使用“ with / as”语法,请在try / catch块中使用它,并在pool.close()
块中添加finally
语句以确保该池已关闭。
答案 1 :(得分:1)
pandas.json_normalize
相比,迭代提取信息更容易。data
的值为str
类型,必须将其转换为dict
。key
和value
中提取每个'bid'
'ask'
对,以创建单独的记录。
import json
import pandas
# list of tuples, where the value of data, is a string
transaction_data = [('1599324732926-0', {'data': '{"timestamp":1599324732.767, "receipt_timestamp":1599324732.9256856, "delta":true, "bid":{"338.9":0.06482,"338.67":3.95535}, "ask":{"339.12":2.47578,"339.13":6.43172}}'}),
('1599324732926-1', {'data': '{"timestamp":1599324732.767, "receipt_timestamp":1599324732.9256856, "delta":true, "bid":{"338.9":0.06482,"338.67":3.95535}, "ask":{"339.12":2.47578,"339.13":6.43172}}'}),
('1599324732926-2', {'data': '{"timestamp":1599324732.767, "receipt_timestamp":1599324732.9256856, "delta":true, "bid":{"338.9":0.06482,"338.67":3.95535}, "ask":{"339.12":2.47578,"339.13":6.43172}}'})]
# create a list of lists for each transaction data
# split each side, key value pair into a separate list
data_key_list = [['timestamp', 'receipt_timestamp', 'delta', 'side', 'price', 'size']]
for v in transaction_data: # # iterate through each transaction
data = json.loads(v[1]['data']) # convert the string to a dict
for side in ['bid', 'ask']: # extract each key, value pair as a separate record
data_key_list += [[data['timestamp'], data['receipt_timestamp'], data['delta'], side, float(k), v] for k, v in data[side].items()]
# create a dataframe
df = pd.DataFrame(data_key_list[1:], columns=data_key_list[0])
# display(df.head())
timestamp receipt_timestamp delta side price size
0 1.59932e+09 1.59932e+09 True bid 338.9 0.06482
1 1.59932e+09 1.59932e+09 True bid 338.67 3.95535
2 1.59932e+09 1.59932e+09 True ask 339.12 2.47578
3 1.59932e+09 1.59932e+09 True ask 339.13 6.43172
4 1.59932e+09 1.59932e+09 True bid 338.9 0.06482
df.to_dict(orient='records')
[out]:
[{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 338.9,
'size': 0.06482},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 338.67,
'size': 3.95535},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 339.12,
'size': 2.47578},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 339.13,
'size': 6.43172},
...]