Question

我有以下数据框（adjusted_RFC_df）：

     Node               Feature Indicator  Scaled     Class    Direction True_False
0       0                   km        <=   0.181   class_4      0 -> 1         NA
125   125                  gini         =   0.000   class_2    0 -> 126       FALSE
1       1                   WPS        <=   0.074   class_5      1 -> 2        TRUE
52     52                  gini         =   0.000   class_2     1 -> 53       FALSE
105   105                  gini         =   0.492   class_3  102 -> 106       FALSE
102   102           weird_words        <=   0.042   class_4  102 -> 103        TRUE
104   104                  gini         =   0.488   class_4  103 -> 105       FALSE
103   103              funktion        <=   0.290   class_4  103 -> 104        TRUE
107   107                  gini         =   0.000   class_5  106 -> 108       FALSE
106   106           Nb_of_verbs        <=   0.094   class_5  106 -> 107        TRUE
110   110                  gini         =   0.000   class_4  109 -> 111       FALSE
109   109                signal        <=   0.320   class_4  109 -> 110        TRUE
112   112          Flesch_Index        <=   0.627   class_1  112 -> 113        TRUE
115   115                  gini         =   0.000   class_3  112 -> 116       FALSE
114   114                  gini         =   0.000   class_1  113 -> 115       FALSE
113   113       Nb_of_auxiliary        <=   0.714   class_1  113 -> 114        TRUE
..    ...                   ...       ...     ...       ...          ...        ...

我试图根据“方向”列中的值对行进行排序（0-> 1，这意味着我试图根据第一个数字0进行排序）。我正在尝试通过使用以下方法来做到这一点：

   ## Sort rows based on first int of Direction column ##
   # create a column['key'] to sort df
   adjusted_RFC_df['key'] = Adjusted_RFC_df['Direction'].apply(lambda    x: x.split()[0])

   # Create new Dataframe with sorted values based on first number of 'Direction' col 
   class_determiner_df = Adjusted_RFC_df.sort_values('key')

这可以按“->”（左侧）之前的第一个值进行排序，但是我需要进行排序以使数字与“->”右侧的数字保持顺序

所以它应该像这样：

     Node               Feature Indicator  Scaled     Class    Direction True_False
0       0                   km        <=   0.181   class_4      0 -> 1         NA
125   125                  gini         =   0.000   class_2    0 -> 126       FALSE
1       1                   WPS        <=   0.074   class_5      1 -> 2        TRUE
52     52                  gini         =   0.000   class_2     1 -> 53       FALSE
105   105           weird_words         =   0.492   class_3  102 -> 103       FALSE
102   102                  gini        <=   0.042   class_4  102 -> 103        TRUE
104   104              funktion         =   0.488   class_4  103 -> 104       FALSE
103   103                  gini        <=   0.290   class_4  103 -> 105        TRUE
107   107           Nb_of_verbs         =   0.000   class_5  106 -> 107       FALSE
106   106                  gini        <=   0.094   class_5  106 -> 108        TRUE
110   110                signal         =   0.000   class_4  109 -> 110       FALSE
109   109                  gini        <=   0.320   class_4  109 -> 111        TRUE
112   112          Flesch_Index        <=   0.627   class_1  112 -> 113        TRUE
115   115                  gini         =   0.000   class_3  112 -> 116       FALSE
114   114        Nb_of_auxiliary        =   0.000   class_1  113 -> 114       FALSE
113   113                  gini        <=   0.714   class_1  113 -> 115        TRUE
..    ...                   ...       ...     ...       ...          ...        ...

这使我感到困惑，因为有时它会保持右侧数字之间的顺序，但是大多数时候却没有。

我认为也许排序字符串是一个问题，因为col的方向是string类型。因此，我尝试执行以下操作：

adjusted_RFC_df['key'] = adjusted_RFC_df['key'].astype(np.int64)

但这会导致以下错误：

ValueError: invalid literal for int() with base 10: 'NA'

因此，似乎正在尝试将['TRUE / FALSE']列转换为int以及仅转换['key']列。

Direction col可能是字符串类型的问题吗？

或者是否有一种方法可以在“->”之前的第一个数字的基础上进行排序，同时确保第二个数字也按顺序排列（从最小到最大）？

Answer 1

如果Direction始终是字符串类型，并且也具有int space '->' space int这样的1 -> 2格式，那么您可以获取另一个排序键

df['key1'] = df['Direction'].apply(lambda x: x.split()[0])
df['key2'] = df['Direction'].apply(lambda x: x.split()[2])

然后根据这两个键进行排序

df.sort_values(['key1', 'key2'])

编辑：这是获取key1和'key2'

的另一种方法

df['key1'] = df['Direction'].apply(lambda x: int(x.split('->')[0]))
df['key2'] = df['Direction'].apply(lambda x: int(x.split('->')[1]))

对数据框的行进行排序

1 个答案: