我正在学习熊猫,并且一直难以理解数据透视表。下面是我正在运行的示例程序。
import pandas as pd
df = pd.read_csv('/Users/xxx/Desktop/df.csv')
print(df)
df = df.pivot_table(index='__timestamp', columns=[], values=['passed_count', 'failed_count'])
print(df)
程序将在输出下面打印-
__timestamp failed_count passed_count Unnamed: 3
0 27/05/18 0.019417 0.980583
1 03/06/18 0.427136 0.839196
2 10/06/18 0.839416 0.854015
3 17/06/18 0.403846 0.913462
4 24/06/18 1.429688 0.757812
5 01/07/18 6.781457 0.701987
6 08/07/18 0.324561 0.929825
7 15/07/18 0.295082 0.970492
8 22/07/18 0.849802 0.960474
9 29/07/18 0.673333 0.923333
10 05/08/18 0.276657 0.919308
11 12/08/18 0.242105 0.821053
12 19/08/18 0.176471 0.976471
passed_count
__timestamp
01/07/18 0.701987
03/06/18 0.839196
05/08/18 0.919308
08/07/18 0.929825
10/06/18 0.854015
12/08/18 0.821053
15/07/18 0.970492
17/06/18 0.913462
19/08/18 0.976471
22/07/18 0.960474
24/06/18 0.757812
27/05/18 0.980583
29/07/18 0.923333
在执行pivot_table()之后,我无法理解第三列的缺失。可以像上面一样给多个值吗?所提供的价值选择的意义何在?
编辑:
根据评论中的要求-
CSV文件的内容是-
__timestamp,failed_count,passed_count,
27/05/18,0.019417 ,0.980583,
03/06/18,0.427136 ,0.839196,
10/06/18,0.839416 ,0.854015,
17/06/18,0.403846 ,0.913462,
24/06/18,1.429688 ,0.757812,
01/07/18,6.781457 ,0.701987,
08/07/18,0.324561 ,0.929825,
15/07/18,0.295082 ,0.970492,
22/07/18,0.849802 ,0.960474,
29/07/18,0.673333 ,0.923333,
05/08/18,0.276657 ,0.919308,
12/08/18,0.242105 ,0.821053,
19/08/18,0.176471 ,0.976471,
df.head()的输出,在读取CSV后立即显示
__timestamp failed_count passed_count Unnamed: 3
0 27/05/18 0.019417 0.980583
1 03/06/18 0.427136 0.839196
2 10/06/18 0.839416 0.854015
3 17/06/18 0.403846 0.913462
4 24/06/18 1.429688 0.757812
答案 0 :(得分:2)
正如我们在评论中发现的那样,pandas的pivot_table
函数将默默地忽略值列表中的任何非数字(在这种情况下为str
)列。 failed_count
列就是这样解释的。