我正在从目标传统系统读取数据,该系统中包含股票收益数据。数据以JSON格式导出到该收入模块等模块中。
earnings_dict = {
"earningsChart": {
"quarterly": [
{
"date": "1Q2018",
"actual": {
"raw": 0.12,
"fmt": "0.12"
},
"estimate": {
"raw": 0.05,
"fmt": "0.05"
}
},
{
"date": "2Q2018",
"actual": {
"raw": 0.21,
"fmt": "0.21"
},
"estimate": {
"raw": 0.19,
"fmt": "0.19"
}
},
{
"date": "3Q2018",
"actual": {
"raw": 0.16,
"fmt": "0.16"
},
"estimate": {
"raw": 0.21,
"fmt": "0.21"
}
},
{
"date": "4Q2018",
"actual": {
"raw": 0.07,
"fmt": "0.07"
},
"estimate": {
"raw": 0.14,
"fmt": "0.14"
}
}
],
"currentQuarterEstimate": {
"raw": 0.15,
"fmt": "0.15"
},
"currentQuarterEstimateDate": "1Q",
"currentQuarterEstimateYear": 2019,
"earningsDate": [
{
"raw": 1556496000,
"fmt": "2019-04-29"
},
{
"raw": 1556841600,
"fmt": "2019-05-03"
}
]
},
"financialsChart": {
"yearly": [
{
"date": 2015,
"revenue": {
"raw": 74977000,
"fmt": "74.98M",
"longFmt": "74,977,000"
},
"earnings": {
"raw": -15668000,
"fmt": "-15.67M",
"longFmt": "-15,668,000"
}
},
{
"date": 2016,
"revenue": {
"raw": 105586000,
"fmt": "105.59M",
"longFmt": "105,586,000"
},
"earnings": {
"raw": -8281000,
"fmt": "-8.28M",
"longFmt": "-8,281,000"
}
},
{
"date": 2017,
"revenue": {
"raw": 143803000,
"fmt": "143.8M",
"longFmt": "143,803,000"
},
"earnings": {
"raw": 9716000,
"fmt": "9.72M",
"longFmt": "9,716,000"
}
},
{
"date": 2018,
"revenue": {
"raw": 190071000,
"fmt": "190.07M",
"longFmt": "190,071,000"
},
"earnings": {
"raw": 19967000,
"fmt": "19.97M",
"longFmt": "19,967,000"
}
}
],
"quarterly": [
{
"date": "1Q2018",
"revenue": {
"raw": 42340000,
"fmt": "42.34M",
"longFmt": "42,340,000"
},
"earnings": {
"raw": 4320000,
"fmt": "4.32M",
"longFmt": "4,320,000"
}
},
{
"date": "2Q2018",
"revenue": {
"raw": 47240000,
"fmt": "47.24M",
"longFmt": "47,240,000"
},
"earnings": {
"raw": 7474000,
"fmt": "7.47M",
"longFmt": "7,474,000"
}
},
{
"date": "3Q2018",
"revenue": {
"raw": 50126000,
"fmt": "50.13M",
"longFmt": "50,126,000"
},
"earnings": {
"raw": 5524000,
"fmt": "5.52M",
"longFmt": "5,524,000"
}
},
{
"date": "4Q2018",
"revenue": {
"raw": 50365000,
"fmt": "50.37M",
"longFmt": "50,365,000"
},
"earnings": {
"raw": 2649000,
"fmt": "2.65M",
"longFmt": "2,649,000"
}
}
]
},
"financialCurrency": "USD"}
如您所见,JSON在字典的顶层嵌套了一些元数据,使用pandas.io.json_normalize之类的内容易于读取。
df = pd.io.json.json_normalize(earnings_dict)
df
Out[13]:
earningsChart.currentQuarterEstimate.fmt ... financialsChart.yearly
0 0.15 ... [{'date': 2015, 'revenue': {'raw': 74977000, '...
[1 rows x 9 columns]
但是,它错过了包含多年和季度收益数据的字典嵌套列表。例如。季度列表和年度列表只是作为字典列表添加到数据框的。
我想这最初是几个带有外键的SQL表。
我已经阅读了json_normalize文档,但似乎无法解决如何使用record_path和meta参数解析字典的问题。
我想我可以使用json_normalize甚至从嵌套的多个级别的字典中创建DataFrame。看来我至少需要5个-一个用于元数据,一个至少4个用于2个年表和年表。
您将如何存储它?您将其存储在NoSQL字符串数据库中还是将其保留在SQL中?我的要求是进行低负载,轻量级的分析,这需要使用pandas和matplotlib进行一些视图和图形处理。
感谢您的帮助!