拆分大熊猫列包含字典列表

时间:2020-10-29 17:46:32

标签: python pandas

我有一个pandas列,该列中的每个单元格都包含带有每张照片的颜色属性的字典列表,例如:

[{'color': 'black', 'confidence': 1.0}, {'color': 'brown', 'confidence': 0.72}, {'color': 'gray', 'confidence': 0.62}, {'color': 'other', 'confidence': 0.52}, {'color': 'red', 'confidence': 0.01}, {'color': 'blond', 'confidence': 0.01}, {'color': 'white', 'confidence': 0.0}]

我希望能够将此包含字典列表的列拆分为多个新的pandas列。例如,我想要一个名为“ black”的列,其值为“ 1.0”,一个名为“ brown”的列,其值为“ 0.72”,等等。

我正在努力做到这一点。将不胜感激提示。 谢谢!

4 个答案:

答案 0 :(得分:1)

a = [{'color': 'black', 'confidence': 1.0}, {'color': 'brown', 'confidence': 0.72}, {'color': 'gray', 'confidence': 0.62}, {'color': 'other', 'confidence': 0.52}, {'color': 'red', 'confidence': 0.01}, {'color': 'blond', 'confidence': 0.01}, {'color': 'white', 'confidence': 0.0}]

c= []
co = []
for d in a:
    c.append(d['color'])
    co.append(d['confidence'])
    
df = pd.DataFrame()
df['color'] = c
df['confidence'] = co

df = df.transpose()
#make the first column header
df.columns = df.iloc[0]
df = df[1:]
Output:
df
Out[159]: 
color      black brown  gray other   red blond white
confidence     1  0.72  0.62  0.52  0.01  0.01     0
'''

If this answer is correct, kindly accept and upvote the answer. Else, comment the doubt or issue, I would be happy to help

答案 1 :(得分:1)

让我们尝试一下:

pd.DataFrame(df['col'].tolist()).set_index('color').T

输出:

color       black  brown  gray  other   red  blond  white
confidence    1.0   0.72  0.62   0.52  0.01   0.01    0.0

答案 2 :(得分:1)

谢谢。这对我有用。我受到Tejas答案的启发:

from ast import literal_eval

df["black"]=""
df["brown"]=""
df["gray"]=""
df["other"]=""
df["red"]=""
df["blond"]=""
df["white"]=""

for k,v in df.iterrows():
    res = literal_eval(df["Color_list"][k])
    for d in res:
         df[d["color"]][k]=d["confidence"]

答案 3 :(得分:0)

您可以对apply使用自定义函数,该函数返回一个Series来完成此操作:

数据

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(
    {
        "A": ["a", "b"],
        "B": [
            [
                {"color": "black", "confidence": 1.0},
                {"color": "brown", "confidence": 0.72},
                {"color": "gray", "confidence": 0.62},
                {"color": "other", "confidence": 0.52},
                {"color": "red", "confidence": 0.01},
                {"color": "blond", "confidence": 0.01},
                {"color": "white", "confidence": 0.0},
            ],
            [
                {"color": "black", "confidence": 0.8},
                {"color": "brown", "confidence": 0.5},
                {"color": "gray", "confidence": 0.4},
                {"color": "other", "confidence": 0.32},
                {"color": "red", "confidence": 0.11},
            ],
        ],
    }
)

print(df)
   A                                                  B
0  a  [{'color': 'black', 'confidence': 1.0}, {'colo...
1  b  [{'color': 'black', 'confidence': 0.8}, {'colo...

方法 由于每个单元格都是字典列表,因此我们需要将每个单元格变成其自己的Series,其中索引是"color",而值是"confidence"apply负责将这些Series对象粘在一起并输出新的DataFrame

def clean_cell(records, index, values):
    return (pd.DataFrame(records)
            .set_index(index)
            .rename_axis(None)
            [values])

record_df = df["B"].apply(clean_cell, args=("color", "confidence"))

print(record_df)
   black  brown  gray  other   red  blond  white
0    1.0   0.72  0.62   0.52  0.01   0.01    0.0
1    0.8   0.50  0.40   0.32  0.11    NaN    NaN