Pandas DataFrame:将列值收集到一行

时间:2018-03-17 20:14:37

标签: python python-3.x pandas numpy tensorflow

我有这张表,主要问题是ID ,W_WeightClass列的长度不一致

  

注意:例如,ID的每个数字都与Class相关联   (ID 0Class 1.0ID 4Class 5.0

   ID   W_Weight    Class

0   0   0.255265    1.0
1   0   0.273844    1.0
2   0   0.351219    1.0
3   0   0.262033    1.0
4   0   0.351219    5.0
5   0   0.258109    1.0
6   0   0.296328    5.0
7   0   0.351219    1.0
8   0   0.301208    1.0
9   0   0.273844    1.0
10  0   0.317767    1.0
11  1   0.299451    1.0
12  1   0.327183    5.0
13  1   0.391577    1.0
14  1   0.272526    1.0
15  1   0.412015    1.0
16  1   0.412015    1.0
17  1   0.287148    1.0
18  1   0.168667    5.0
19  1   0.257689    1.0
20  1   0.242609    1.0
21  2   0.190351    5.0
22  2   0.204205    5.0
23  2   0.254588    5.0
24  2   0.261904    1.0
25  2   0.195398    5.0
26  2   0.248913    5.0
27  2   0.161089    1.0
28  2   0.240355    5.0
29  2   0.261904    1.0
... ... ... ...
410722  32742   0.190023    NaN
410723  32742   0.190023    NaN
410724  32742   0.184970    NaN
410725  32742   0.166998    NaN
410726  32742   0.196789    NaN
410727  32742   0.171033    NaN
410728  32742   0.207060    NaN
410729  32742   0.171033    NaN
410730  32742   0.179186    NaN
410731  32742   0.207060    NaN
410732  32742   0.182852    NaN
410733  32742   0.146492    NaN
410734  32742   0.141293    NaN
410735  32742   0.193123    NaN
410736  32742   0.207060    NaN
410737  32742   0.092576    NaN
410738  32742   0.207060    NaN
410739  32742   0.160762    NaN
410740  32742   0.165249    NaN
410741  32742   0.207060    NaN
410742  32742   0.184970    NaN
410743  32742   0.147506    NaN
410744  32742   0.207060    NaN
410745  32742   0.190023    NaN
410746  32742   0.116286    NaN
410747  32742   0.070032    NaN
410748  32742   0.207060    NaN
410749  32742   0.166998    NaN
410750  32742   0.147506    NaN
410751  32742   0.207060    NaN

所需的表应如下所示

  

注意:索引为0的第一行只是一个例子,我想这样做   这适用于W_Weight

中的所有数据
   ID                  W_Weight                                 Class
0   0   {0.25,0.27,0.35,0.26,0.35,0.25,0.29,0.35,0.30,0.27,0.31} 1.0
11  1   0.299451                                                 1.0
12  1   0.327183                                                 5.0
13  1   0.391577                                                 1.0
14  1   0.272526                                                 1.0
15  1   0.412015                                                 1.0
16  1   0.412015                                                 1.0
17  1   0.287148                                                 1.0
18  1   0.168667                                                 5.0
19  1   0.257689                                                 1.0
20  1   0.242609                                                 1.0
21  2   0.190351                                                 5.0
22  2   0.204205                                                 5.0
23  2   0.254588                                                 5.0
24  2   0.261904                                                 1.0
25  2   0.195398                                                 5.0
26  2   0.248913                                                 5.0
27  2   0.161089                                                 1.0
28  2   0.240355                                                 5.0
29  2   0.261904                                                 1.0

我这样做是为了将ClassID and W_Weight相匹配,因为我正在使用TensorFlow进行分类

1 个答案:

答案 0 :(得分:0)

你被正确地建议不要做你想做的事。尽管如此,如果你坚持,这是一个解决方案:

df.groupby('ID')['W_Weight'].apply(set)
#ID
#0    {0.255265, 0.351219, 0.25810900000000003, 0.26...
#1    {0.299451, 0.327183, 0.27252600000000005, 0.39...
相关问题