在带有数组

时间:2017-07-31 14:14:51

标签: python json pandas normalize

问题是使用嵌套的json对象数组规范化json。我看过类似的问题,试图用他们的解决方案无济于事。

这是我的json对象的样子。

{
  "results": [
    {
      "_id": "25",
      "Product": {
        "Description": "3 YEAR",
        "TypeLevel1": "INTEREST",
        "TypeLevel2": "LONG"
      },
      "Settlement": {},
      "Xref": {
        "SCSP": "96"
      },
      "ProductSMCP": [
        {
          "SMCP": "01"
        }
      ]
    },
    {
      "_id": "26",
      "Product": {
        "Description": "10 YEAR",
        "TypeLevel1": "INTEREST",
        "Currency": "USD",
        "Operational": true,
        "TypeLevel2": "LONG"
      },
      "Settlement": {},
      "Xref": {
        "BBT": "CITITYM9",
        "TCK": "ZN"
      },
      "ProductSMCP": [
        {
          "SMCP": "01"
        },
        {
          "SMCP2": "02"
        }
      ]
    }
  ]
}

这是我的规范化json对象的代码。

data = json.load(j)
data = data['results']
print pd.io.json.json_normalize(data)

想要的结果应该是这样的

id   Description    TypeLevel1   TypeLevel2  Currency  \
25   3 YEAR US      INTEREST     LONG        NAN
26   10 YEAR US     INTEREST     NAN         USD

BBT   TCT  SMCP  SMCP2  SCSP   
NAN   NAN  521   NAN    01
M9    ZN   01    02     NAN

然而,我得到的结果是:

  Product.Currency Product.Description Product.Operational Product.TypeLevel1  \
0              NaN              3 YEAR                 NaN           INTEREST
1              USD             10 YEAR                True           INTEREST

  Product.TypeLevel2                        ProductSMCP  Xref.BBT Xref.SCSP  \
0               LONG                   [{'SMCP': '01'}]       NaN        96
1               LONG  [{'SMCP': '01'}, {'SMCP2': '02'}]  CITITYM9       NaN

  Xref.TCK _id
0      NaN  25
1       ZN  26

正如您所看到的,问题出在 ProductSCMP 它并未完全展平数组。

1 个答案:

答案 0 :(得分:2)

一旦我们通过第一次规范化,我就会申请lambda来完成这项工作。

from cytoolz.dicttoolz import merge

pd.io.json.json_normalize(data).pipe(
    lambda x: x.drop('ProductSMCP', 1).join(
        x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
    )
)

  Product.Currency Product.Description Product.Operational Product.TypeLevel1 Product.TypeLevel2  Xref.BBT Xref.SCSP Xref.TCK _id SMCP SMCP2
0              NaN              3 YEAR                 NaN           INTEREST               LONG       NaN        96      NaN  25   01   NaN
1              USD             10 YEAR                True           INTEREST               LONG  CITITYM9       NaN       ZN  26   01    02

修剪列名称

pd.io.json.json_normalize(data).pipe(
    lambda x: x.drop('ProductSMCP', 1).join(
        x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
    )
).rename(columns=lambda x: re.sub('(Product|Xref)\.', '', x))

  Currency Description Operational TypeLevel1 TypeLevel2       BBT SCSP  TCK _id SMCP SMCP2
0      NaN      3 YEAR         NaN   INTEREST       LONG       NaN   96  NaN  25   01   NaN
1      USD     10 YEAR        True   INTEREST       LONG  CITITYM9  NaN   ZN  26   01    02