对其中一列包含nan值的数据框进行排序

时间:2019-06-12 05:53:52

标签: python sorting dataframe

我有一个数据框。

+------------+------------+------------+------+
| Item Type  | Year_Month | Total Cost | Diff |
+------------+------------+------------+------+
| Baby Food  | Jul-2017   | 3000       | 100  |
+------------+------------+------------+------+
| Baby Food  | Jun-2017   | 2900       | 100  |
+------------+------------+------------+------+
| Cereal     | Jul-2017   | 6000       | 1000 |
+------------+------------+------------+------+
| Cereal     | Jun-2017   | 5000       | 1000 |
+------------+------------+------------+------+
| Snacks     | Jul-2017   | 4500       | Nan  |
+------------+------------+------------+------+
| Chocolates | Jul-2017   | 3000       | Nan  |
+------------+------------+------------+------+
| Ice Cream  | Jul-2017   | 4000       | Nan  |
+------------+------------+------------+------+

我想基于diff对数据框进行排序,但是在这种情况下,如果它包含Nan,则应该根据总成本进行排序。所以我的最终输出看起来像

+------------+------------+------------+------+
|  Item Type | Year_Month | Total Cost | Diff |
+------------+------------+------------+------+
| Cereal     | Jul-2017   | 6000       | 1000 |
+------------+------------+------------+------+
| Cereal     | Jun-2017   | 5000       | 1000 |
+------------+------------+------------+------+
| Baby Food  | Jul-2017   | 3000       | 100  |
+------------+------------+------------+------+
| Baby Food  | Jun-2017   | 2900       | 100  |
+------------+------------+------------+------+
| Snacks     | Jul-2017   | 4500       | Nan  |
+------------+------------+------------+------+
| Ice Cream  | Jul-2017   | 4000       | Nan  |
+------------+------------+------------+------+
| Chocolates | Jul-2017   | 3000       | Nan  |
+------------+------------+------------+------+

一种实现方法是将数据帧分为2个数据帧(当diff等于Nan时,一个包含所有带有diff的行不等于Nan,另一个包含行的数据帧)。然后根据差异和总成本对每个数据框进行排序,然后将它们合并。

+-----------+------------+------------+------+
| Item Type | Year_Month | Total Cost | Diff |
+-----------+------------+------------+------+
| Baby Food | Jul-2017   | 3000       | 100  |
+-----------+------------+------------+------+
| Baby Food | Jun-2017   | 2900       | 100  |
+-----------+------------+------------+------+
| Cereal    | Jul-2017   | 6000       | 1000 |
+-----------+------------+------------+------+
| Cereal    | Jun-2017   | 5000       | 1000 |
+-----------+------------+------------+------+


+------------+------------+------------+------+
| Item Type  | Year_Month | Total Cost | Diff |
+------------+------------+------------+------+
| Snacks     | Jul-2017   | 4500       | Nan  |
+------------+------------+------------+------+
| Ice Cream  | Jul-2017   | 4000       | Nan  |
+------------+------------+------------+------+
| Chocolates | Jul-2017   | 3000       | Nan  |
+------------+------------+------------+------+

是否还有其他优化的方式来执行此操作,因为这将涉及大量计算?

2 个答案:

答案 0 :(得分:1)

当按列(此处为'Diff')对数据框(df)排序时,Nan值将移至数据框的末尾。因此,通过按两列(“差异”和“总成本”)对数据框进行排序,我们可以得出所需的结果。

以下是同一代码:

    df=df.sort_values(by=['Diff','Total Cost'],ascending=False)

答案 1 :(得分:0)

您可以简单地使用带有功能键的排序功能:

在:

name: "ResNet_50_1by2_nsfw"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 1 dim: 3 dim: 224 dim: 224 } }
}

退出:

import json

jsonv = [
 {
   "Item Type": "Snacks",
   "Year_Month": "Jul-2017",
   "Total Cost": 4500,
   "Diff": "5"
 },
 {
   "Item Type": "Ice Cream",
   "Year_Month": "Jul-2017",
   "Total Cost": 4000,
   "Diff": "Nan"
 },
 {
   "Item Type": "Chocolates",
   "Year_Month": "Jul-2017",
   "Total Cost": 3000,
   "Diff": "4"
 }
]

def extract_diff(json):
    try:
        jdiff = json['Diff']
        ret = int(jdiff) if jdiff != 'Nan' else 0
        return ret
    except KeyError:
        return 0

jsonv.sort(key=extract_diff, reverse=True)

print(json.dumps(jsonv, indent=4))