我有多个看起来像这样的数据集:
reactstrap
几乎所有内容都组织为“数据类型”或“名称”。
在文本文件中嵌入了许多这样的数据集。每一组以ConnectedRouter
开头,以connected-react-router
结尾。我试图将这些数据组织到一个数据框中,或者以某种方式展平/规范化它,以便使人更容易阅读。我该怎么办?
我猜想有一个简单而直接的方法可以完成所有这些工作,但是在Google上搜索了一段时间之后,我仍然找不到解决方案,所以我回到了这里。 TIA。
答案 0 :(得分:2)
这将递归地将值提取到扁平化的dict中,每个“扁平化”级别将组合成每个键的最终字符串。因此,如果扁平化级别为0(对象已经是字典),它将像您期望的那样像类:pipesteps.validate.Validate。如果很深,那么您会看到会发生什么:
from pandas.io.json import json_normalize
a = {'class': 'pipesteps.validate.Validate', 'conf': {'schema_def': {'fields': [{'data_type': 'STRING', 'name': 'Operation'}, {'data_type': 'STRING', 'name': 'SNL_Institution_Key'}, {'data_type': 'INTEGER', 'name': 'SNL_Funding_Key'}, {'data_type': 'STRING', 'name': 'CUSIP'}, {'data_type': 'STRING', 'name': 'SEDOL_NULL'}, {'data_type': 'STRING', 'name': 'Ticker'}, {'data_type': 'DATETIME', 'name': 'Date_of_Closing_Price'}, {'data_type': 'FLOAT', 'name': 'Total_Return_MTD'}, {'data_type': 'FLOAT', 'name': 'TR_SNL_Peer_Index_Change'}, {'data_type': 'FLOAT', 'name': 'TR_SNL_Broad_Index_Change'}, {'data_type': 'FLOAT', 'name': 'TR_SandP_500'}, {'data_type': 'DATETIME', 'name': 'Beginning_Pricing_Date'}]}}, 'id': 'validate'}
def flatten_json(y):
out = {}
# some recursion
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat_json = flatten_json(a)
df = (json_normalize(flat_json)).T # .T because it makes a DF of 1 row and 26 columns and I didn't like that
如果您不希望它转置而只想26列,因为那样访问数据可能会更容易,那么只需在df末尾取下.T
输出:
>>> df
0
class pipesteps.validate.Validate
conf_schema_def_fields_0_data_type STRING
conf_schema_def_fields_0_name Operation
conf_schema_def_fields_10_data_type FLOAT
conf_schema_def_fields_10_name TR_SandP_500
conf_schema_def_fields_11_data_type DATETIME
conf_schema_def_fields_11_name Beginning_Pricing_Date
conf_schema_def_fields_1_data_type STRING
conf_schema_def_fields_1_name SNL_Institution_Key
conf_schema_def_fields_2_data_type INTEGER
conf_schema_def_fields_2_name SNL_Funding_Key
conf_schema_def_fields_3_data_type STRING
conf_schema_def_fields_3_name CUSIP
conf_schema_def_fields_4_data_type STRING
conf_schema_def_fields_4_name SEDOL_NULL
conf_schema_def_fields_5_data_type STRING
conf_schema_def_fields_5_name Ticker
conf_schema_def_fields_6_data_type DATETIME
conf_schema_def_fields_6_name Date_of_Closing_Price
conf_schema_def_fields_7_data_type FLOAT
conf_schema_def_fields_7_name Total_Return_MTD
conf_schema_def_fields_8_data_type FLOAT
conf_schema_def_fields_8_name TR_SNL_Peer_Index_Change
conf_schema_def_fields_9_data_type FLOAT
conf_schema_def_fields_9_name TR_SNL_Broad_Index_Change
id validate