对行进行排序并删除NaN值

时间:2019-06-11 10:20:39

标签: python python-3.x pandas sorting dataframe

我有一个数据集,如下所示:

   state                       Item_Number
0     AP    1.0, 4.0, 20.0, 2.0, 11.0, 7.0
1    GOA      1.0, 4.0, nan, 2.0, 8.0, nan
2     GU    1.0, 4.0, 13.0, 2.0, 11.0, 7.0
3     KA    1.0, 23.0, nan, nan, 11.0, 7.0
4     MA  1.0, 14.0, 13.0, 2.0, 19.0, 21.0

我想删除NaN值并对行进行排序,以及将float转换为int。完成后,数据集应如下所示:

   state            Item_Number
0     AP    1, 2, 4, 7, 11, 20
1    GOA            1, 2, 4, 8
2     GU    1, 2, 4, 7, 11, 13
3     KA          1, 7, 11, 23
4     MA  1, 2, 13, 14, 19, 21

2 个答案:

答案 0 :(得分:2)

使用Series.str.splitSeries.apply的另一种解决方案:

df['Item_Number'] = (df.Item_Number.str.split(',')
                     .apply(lambda x: ', '.join([str(z) for z in sorted([int(float(y)) for y in x if 'nan' not in y])])))

[出]

  state           Item_Number
0    AP    1, 2, 4, 7, 11, 20
1   GOA            1, 2, 4, 8
2    GU    1, 2, 4, 7, 11, 13
3    KA          1, 7, 11, 23
4    MA  1, 2, 13, 14, 19, 21

答案 1 :(得分:0)

通过列表NaN != NaN使用列表理解并删除缺失值:

df['Item_Number'] = [sorted([int(float(y)) for y in x.split(',') if float(y) == float(y)]) for x in df['Item_Number']]
print (df)
  state             Item_Number
0    AP    [1, 2, 4, 7, 11, 20]
1   GOA            [1, 2, 4, 8]
2    GU    [1, 2, 4, 7, 11, 13]
3    KA          [1, 7, 11, 23]
4    MA  [1, 2, 13, 14, 19, 21]

如果需要字符串:

df['Item_Number'] = [' '.join(map(str, sorted([int(float(y)) for y in x.split(',') if float(y) == float(y)]))) for x in df['Item_Number']]
print (df)
  state      Item_Number
0    AP    1 2 4 7 11 20
1   GOA          1 2 4 8
2    GU    1 2 4 7 11 13
3    KA        1 7 11 23
4    MA  1 2 13 14 19 21