熊猫表刮

时间:2017-12-18 14:24:48

标签: python json pandas

我正在尝试将表转换为JSON记录的最佳方法。目前我有所需的输出,但桌子的格式让我感到困惑。下面的例子应该解释:

ID   Product        Item_Material   Owner           Interest %
123  Test Item 1    Electric        Elctrotech              60%
null null           null            Spark inc               40%
124  Test Item 2    Wood            TY Toys                 100%
125  Test Item 3    Plastic         NA Materials            100%

我的新行JSON是我想要的,但我希望以某种方式将嵌套的表行实现为嵌套的JSON格式,如果是父行的一部分。

{"ID":"Test Item 1", "Item_Material":"Electric", "Owner":"Elctrotech","Interest %":"60%"}
{"ID":null, "Item_Material":null, "Owner":"Spark inc","Insterest %":"40%"} 
{"ID":"Test Item 2", "Item_Material":"Wood", "Owner":"TY Toys","Insterest %":"100%"}
{"ID":"Test Item 3","Item_Material":"Plastic","Owner":"NA Materials","Interest %":"100%"}

目标是让第一行JSON像这样吗?

{"ID":"Test Item 1", "Item_Material":"Electric", "Owners": [{"Owner": "Elctrotech", "Interest %":"60%", "Owner":"Spark inc","Interest %":"40%"}]}

数据源自使用Beautiful Soup的刮表,我提供的表中的行都在单独的<tr>标记中,因此当拉入pandas数据帧时,它会以这种方式呈现。我不知道是否有功能甚至将pandas合并到上面的行中,因此我可以在每个&#39;产品中有一个JSON记录。有时会有多个“拥有者”。每件商品不仅仅是2件。

1 个答案:

答案 0 :(得分:0)

输出dict行与你预期的不一样,但你的dict sintax错了。试试这个。只有Pandas

p=[[123,"Test Item 1","Electric","Elctrotech","60%"], [124,"Test Item 2","Wood"," TY Toys","100%"],[125,"Test Item 1","Plastic","NA Materials","100%"], [123,"Test Item 1","Foo","Bar","80%"], [123,"Test Item 1","Electric","TRY TRY TRY","70%"]]

x=pd.DataFrame(p, columns=["ID","Product","Item_Material","Owner","Interest %"])

d=dict(ID="", Item_Material="", Owners={"Owner":[], "Interest %":[]})
x_gb=x.groupby(["Product", "Item_Material"])
grouped_Series_Owner = x_gb["Owner"].apply(list).to_dict()
grouped_Series_Interest = x_gb["Interest %"].apply(list).to_dict()
for k in out.keys():
    d["Item_Material"]=out[k]["Item_Material"]
    d["ID"]=out[k]["Product"]
    d["Owners"]["Owner"]= grouped_Series_Owner[(out[k]["Product"], out[k]["Item_Material"])]
    d["Owners"]["Interest %"]= grouped_Series_Interest[(out[k]["Product"], out[k]["Item_Material"])]
    print(d)