我的数据如下:
[('06/03/2018 17.35.18.211', 'param_a', 1),
('06/03/2018 17.35.19.211', 'param_b', 1),
('06/03/2018 17.35.20.211', 'param_c', 1),
('06/03/2018 17.35.21.211', 'param_a', 2),
('06/03/2018 17.35.22.211', 'param_b', 2),
('06/03/2018 17.35.22.211', 'param_c', 2)]
从中创建数据帧的最佳方法是什么:
timestamp param_a param_b param_C
0 06/03/2018 17.35.18.211 1.0 NaN NaN
1 06/03/2018 17.35.19.211 NaN 1.0 NaN
2 06/03/2018 17.35.20.211 NaN NaN 1.0
3 06/03/2018 17.35.21.211 2.0 NaN NaN
4 06/03/2018 17.35.22.211 NaN 2.0 2.0
答案 0 :(得分:1)
将DataFrame
构造函数与pivot
,rename_axis
和reset_index
一起使用:
arr = [('06/03/2018 17.35.18.211', 'param_a', 1),
('06/03/2018 17.35.19.211', 'param_b', 1),
('06/03/2018 17.35.20.211', 'param_c', 1),
('06/03/2018 17.35.21.211', 'param_a', 2),
('06/03/2018 17.35.22.211', 'param_b', 2),
('06/03/2018 17.35.23.211', 'param_c', 2)]
df = pd.DataFrame(arr, columns=['timestamp','b','c'])
df = df.pivot('timestamp','b','c').rename_axis(None, axis=1).reset_index()
print (df)
timestamp param_a param_b param_c
0 06/03/2018 17.35.18.211 1.0 NaN NaN
1 06/03/2018 17.35.19.211 NaN 1.0 NaN
2 06/03/2018 17.35.20.211 NaN NaN 1.0
3 06/03/2018 17.35.21.211 2.0 NaN NaN
4 06/03/2018 17.35.22.211 NaN 2.0 NaN
5 06/03/2018 17.35.23.211 NaN NaN 2.0
但如果第一个和第二个值重复,则需要aggregation。
答案 1 :(得分:1)
您也可以试试这个。 (请注意,get_dummies
可能很慢)
arr = [('06/03/2018 17.35.18.211', 'param_a', 1),
('06/03/2018 17.35.19.211', 'param_b', 1),
('06/03/2018 17.35.20.211', 'param_c', 1),
('06/03/2018 17.35.21.211', 'param_a', 2),
('06/03/2018 17.35.22.211', 'param_b', 2),
('06/03/2018 17.35.23.211', 'param_c', 2)]
df = pd.DataFrame(arr)
pd.concat([df[0], df[2].values[:,None] * df[1].str.get_dummies()], axis=1)
0 param_a param_b param_c
0 06/03/2018 17.35.18.211 1 0 0
1 06/03/2018 17.35.19.211 0 1 0
2 06/03/2018 17.35.20.211 0 0 1
3 06/03/2018 17.35.21.211 2 0 0
4 06/03/2018 17.35.22.211 0 2 0
5 06/03/2018 17.35.23.211 0 0 2
或者
v = df[1].str.get_dummies()
pd.concat([df[0], df[2].values[:,None] * v.where(v>0)], axis=1)
0 param_a param_b param_c
0 06/03/2018 17.35.18.211 1.0 NaN NaN
1 06/03/2018 17.35.19.211 NaN 1.0 NaN
2 06/03/2018 17.35.20.211 NaN NaN 1.0
3 06/03/2018 17.35.21.211 2.0 NaN NaN
4 06/03/2018 17.35.22.211 NaN 2.0 NaN
5 06/03/2018 17.35.23.211 NaN NaN 2.0
答案 2 :(得分:0)
您正在尝试创建一个包含3列圆柱数据的4列数据框。如果您想要4列,则必须重新格式化数据。