Question

问题是使用嵌套的json对象数组规范化json。我看过类似的问题，试图用他们的解决方案无济于事。

这是我的json对象的样子。

{
  "results": [
    {
      "_id": "25",
      "Product": {
        "Description": "3 YEAR",
        "TypeLevel1": "INTEREST",
        "TypeLevel2": "LONG"
      },
      "Settlement": {},
      "Xref": {
        "SCSP": "96"
      },
      "ProductSMCP": [
        {
          "SMCP": "01"
        }
      ]
    },
    {
      "_id": "26",
      "Product": {
        "Description": "10 YEAR",
        "TypeLevel1": "INTEREST",
        "Currency": "USD",
        "Operational": true,
        "TypeLevel2": "LONG"
      },
      "Settlement": {},
      "Xref": {
        "BBT": "CITITYM9",
        "TCK": "ZN"
      },
      "ProductSMCP": [
        {
          "SMCP": "01"
        },
        {
          "SMCP2": "02"
        }
      ]
    }
  ]
}

这是我的规范化json对象的代码。

data = json.load(j)
data = data['results']
print pd.io.json.json_normalize(data)

我想要的结果应该是这样的

id   Description    TypeLevel1   TypeLevel2  Currency  \
25   3 YEAR US      INTEREST     LONG        NAN
26   10 YEAR US     INTEREST     NAN         USD

BBT   TCT  SMCP  SMCP2  SCSP   
NAN   NAN  521   NAN    01
M9    ZN   01    02     NAN

然而，我得到的结果是：

  Product.Currency Product.Description Product.Operational Product.TypeLevel1  \
0              NaN              3 YEAR                 NaN           INTEREST
1              USD             10 YEAR                True           INTEREST

  Product.TypeLevel2                        ProductSMCP  Xref.BBT Xref.SCSP  \
0               LONG                   [{'SMCP': '01'}]       NaN        96
1               LONG  [{'SMCP': '01'}, {'SMCP2': '02'}]  CITITYM9       NaN

  Xref.TCK _id
0      NaN  25
1       ZN  26

正如您所看到的，问题出在 ProductSCMP ，它并未完全展平数组。

Answer 1

一旦我们通过第一次规范化，我就会申请lambda来完成这项工作。

from cytoolz.dicttoolz import merge

pd.io.json.json_normalize(data).pipe(
    lambda x: x.drop('ProductSMCP', 1).join(
        x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
    )
)

  Product.Currency Product.Description Product.Operational Product.TypeLevel1 Product.TypeLevel2  Xref.BBT Xref.SCSP Xref.TCK _id SMCP SMCP2
0              NaN              3 YEAR                 NaN           INTEREST               LONG       NaN        96      NaN  25   01   NaN
1              USD             10 YEAR                True           INTEREST               LONG  CITITYM9       NaN       ZN  26   01    02

修剪列名称

pd.io.json.json_normalize(data).pipe(
    lambda x: x.drop('ProductSMCP', 1).join(
        x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
    )
).rename(columns=lambda x: re.sub('(Product|Xref)\.', '', x))

  Currency Description Operational TypeLevel1 TypeLevel2       BBT SCSP  TCK _id SMCP SMCP2
0      NaN      3 YEAR         NaN   INTEREST       LONG       NaN   96  NaN  25   01   NaN
1      USD     10 YEAR        True   INTEREST       LONG  CITITYM9  NaN   ZN  26   01    02

在带有数组

1 个答案: