I have this dataframe(df) that looks like this
`
user_id |date |last_dep_amt| dep_amt| Bin | Action
1031 |2017-03-11 |200.0 |100 | 100-200 | [{'A1':[350,400,450],
'A2':[450,480,490],
'A3':[500,550,600],
'A4':[650, 700,850],
'A5':[750,800,950],
'Last_5_deposits':[50],
'num_unique_a1':3,
'num_unique_a2':4,
'num_unique_a3':7,
'num_unique_a4':8,
'num_unique_a5':9}]
1031 |2017-03-12 |300.0 |120 | 100-200 | [{'A1':[250,300,550],
'A2':[150,440,460],
'A3':[250,300,430],
'A4':[350, 500,650],
'A5':[650,700,780],
'Last_5_deposits':[50],
'num_unique_a1':3,
'num_unique_a2':4,
'num_unique_a3':7,
'num_unique_a4':8,
'num_unique_a5':9}]
231 |2017-03-14 |350.0 |130 | 100-200 | [{'A1':[250,300,550],
'A2':[150,440,460],
'A3':[250,300,430],
'A4':[350, 500,650],
'A5':[650,700,780],
'Last_5_deposits':[50],
'num_unique_a1':3,
'num_unique_a2':4,
'num_unique_a3':7,
'num_unique_a4':8,
'num_unique_a5':9}]
`
Essentially containing 6 columns. Where the last column('Action') of the dataframe is list of dictionary.
所以我需要将最后一列(“操作”)分成多个列,如下所示 例如: user_id | date | last_dep_amt | dep_amt | Bin | A1 | A2 | A3 | A4 | A5 | Last_5_deposits | num_unique_a1 | num_unique_a2 | num_unique_a3 | num_unique_a4 | num_unique_a5
关于数据框的一点点
type(df['Action']) - pandas.core.series.Series
type(df) - pandas.core.frame.DataFrame
预期输出:必须将“操作”列下的所有子列拆分为单独的列
user_id|date|last_dep_amt|dep_amt|Bin|A1|A2|A3|A4|A5|Last_5_deposits|
num_unique_a1|num_unique_a2|num_unique_a3|num_unique_a4|num_unique_a5
`
+---------+-----------+--------------+---------+---------+---------------+---------------+---------------+----------------+---------------+-----------------+---------------+---------------+---------------+----------------+----------------+
| user_id | date | last_dep_amt | dep_amt | Bin | A1 | A2 | A3 | A4 | A5 | Last_5_deposits | num_unique_a1 | num_unique_a2 | num_unique_a3 | num_unique_a4 | num_unique_a5 |
+---------+-----------+--------------+---------+---------+---------------+---------------+---------------+----------------+---------------+-----------------+---------------+---------------+---------------+----------------+----------------+
| 1031 | 3/11/2017 | 200 | 100 | 100-200 | [350,400,450] | [450,480,490] | [500,550,600] | [650, 700,850] | [750,800,950] | [50] | 3 | 4 | 7 | 8 | 9 |
+---------+-----------+--------------+---------+---------+---------------+---------------+---------------+----------------+---------------+-----------------+---------------+---------------+---------------+----------------+----------------+
`
Also have attached below the link that contains an image of the expected final output needed from the above dataframe(df)
`
<https://ibb.co/0JyKhHQ>
答案 0 :(得分:0)
df_action=pd.concat([pd.DataFrame(key) for key in df['Action']]).reset_index(drop=True)
new_df=pd.concat([df[['user_id','date','last_dep_amt','dep_amt','Bin']],df_action],axis=1)
print(new_df)
输出:
user_id date last_dep_amt dep_amt Bin A1 \
0 1031 2017-03-11 200.0 100 100-200 [350, 400, 450]
1 1031 2017-03-12 300.0 120 100-200 [250, 300, 550]
2 231 2017-03-14 350.0 130 100-200 [250, 300, 550]
A2 A3 A4 A5 \
0 [450, 480, 490] [500, 550, 600] [650, 700, 850] [750, 800, 950]
1 [150, 440, 460] [250, 300, 430] [350, 500, 650] [650, 700, 780]
2 [150, 440, 460] [250, 300, 430] [350, 500, 650] [650, 700, 780]
Last_5_deposits num_unique_a1 num_unique_a2 num_unique_a3 num_unique_a4 \
0 [50] 3 4 7 8
1 [50] 3 4 7 8
2 [50] 3 4 7 8
num_unique_a5
0 9
1 9
2 9