我一直在研究数据框,尝试首先按列的值对其进行排序。然后更改某些列的每隔一行的值。要对我正在做的列进行排序:
df['key'] = df['Direction'].apply(lambda x: x.split()[0])
# Take the second number to ensure the order is kept
df['key2'] = df['Direction'].apply(lambda x: x.split()[2])
class_determiner_df = df.sort_values(['key', 'key2'])
这可以按照我在Sort the rows of a data frame上一个问题中的预期对列进行排序。
然后我得到以下数据框:
Node Feature Indicator Scaled Class Direction
0 0 km <= 0.181 class_4 0 -> 1
201 201 gini = 0.000 class_5 0 -> 202
1 1 WPS <= 0.074 class_5 1 -> 2
64 64 gini = 0.000 class_4 1 -> 65
10 10 funktion <= 0.500 class_2 10 -> 11
17 17 gini = 0.000 class_5 10 -> 18
100 100 SPW <= 0.282 class_5 100 -> 101
101 101 gini = 0.000 class_5 100 -> 102
102 102 words_nb <= 0.322 class_3 102 -> 103
123 123 gini = 0.496 class_2 102 -> 124
103 103 words_nb <= 0.125 class_2 103 -> 104
104 104 gini = 0.000 class_2 103 -> 105
105 105 SPW <= 0.290 class_4 105 -> 106
106 106 gini = 0.000 class_4 105 -> 107
107 107 words_nb <= 0.197 class_3 107 -> 108
116 116 gini = 0.000 class_4 107 -> 117
108 108 SPW <= 0.330 class_3 108 -> 109
109 109 gini = 0.000 class_3 108 -> 110
11 11 auftragnehm <= 0.500 class_2 11 -> 12
16 16 gini = 0.000 class_2 11 -> 17
110 110 Comp_conj <= 0.125 class_3 110 -> 111
115 115 gini = 0.000 class_4 110 -> 116
111 111 words_nb <= 0.138 class_3 111 -> 112
112 112 gini = 0.000 class_3 111 -> 113
113 113 weird_words <= 0.167 class_3 113 -> 114
114 114 gini = 0.000 class_3 113 -> 115
117 117 polarity <= 0.175 class_2 117 -> 118
118 118 gini = 0.000 class_2 117 -> 119
119 119 Aux_Start_no <= 0.500 class_3 119 -> 120
120 120 gini = 0.000 class_3 119 -> 121
.. ... ... ... ... ... ...
然后我尝试使df ['feature']和df ['value']的第二行与上面的行相等,并且使df ['indicator']等于'>'
我从以下答案中选取了以下内容:Adjust every other row of a data frame
# Adjust every other row
class_determiner_df.loc[1::2, 'Feature'] = None
class_determiner_df.loc[1::2, 'Scaled'] = None
class_determiner_df.loc[1::2, 'Indicator'] = '>'
# fillna() method of DataFrame scans rows from top, and when it finds a python None value (equivalent to numpy.NaN)
# it replaces the None value with the last significant value from the same column
class_determiner_df.fillna(method='ffill', inplace=True)
这会产生以下不正确的数据帧:
Node Feature Indicator Scaled Class Direction
0 0 km <= 0.181 class_4 0 -> 1
201 201 gini = 0.000 class_5 0 -> 202
1 1 gini > 0.000 class_5 1 -> 2
64 64 gini = 0.000 class_4 1 -> 65
10 10 gini > 0.000 class_2 10 -> 11
17 17 gini = 0.000 class_5 10 -> 18
100 100 gini > 0.000 class_5 100 -> 101
101 101 gini = 0.000 class_5 100 -> 102
102 102 gini > 0.000 class_3 102 -> 103
123 123 gini = 0.496 class_2 102 -> 124
103 103 gini > 0.496 class_2 103 -> 104
104 104 gini = 0.000 class_2 103 -> 105
105 105 gini > 0.000 class_4 105 -> 106
106 106 gini = 0.000 class_4 105 -> 107
107 107 gini > 0.000 class_3 107 -> 108
116 116 gini = 0.000 class_4 107 -> 117
108 108 gini > 0.000 class_3 108 -> 109
109 109 gini = 0.000 class_3 108 -> 110
11 11 gini > 0.000 class_2 11 -> 12
16 16 gini = 0.000 class_2 11 -> 17
110 110 gini > 0.000 class_3 110 -> 111
115 115 gini = 0.000 class_4 110 -> 116
111 111 gini > 0.000 class_3 111 -> 112
112 112 gini = 0.000 class_3 111 -> 113
113 113 gini > 0.000 class_3 113 -> 114
114 114 gini = 0.000 class_3 113 -> 115
117 117 gini > 0.000 class_2 117 -> 118
118 118 gini = 0.000 class_2 117 -> 119
119 119 gini > 0.000 class_3 119 -> 120
120 120 gini = 0.000 class_3 119 -> 121
.. ... ... ... ... ... ...
第二行'gini'替换了其后的每一行,是否有更好的方法来确保数据帧看起来像这样:
Node Feature Indicator Scaled Class Direction
0 0 km <= 0.181 class_4 0 -> 1
201 201 km > 0.181 class_5 0 -> 202
1 1 WPS <= 0.074 class_5 1 -> 2
64 64 WPS > 0.074 class_4 1 -> 65
10 10 funktion <= 0.500 class_2 10 -> 11
17 17 function > 0.500 class_5 10 -> 18
100 100 SPW <= 0.282 class_5 100 -> 101
101 101 SPW > 0.282 class_5 100 -> 102
102 102 words_nb <= 0.322 class_3 102 -> 103
123 123 words_nb > 0.322 class_2 102 -> 124
105 105 SPW <= 0.290 class_4 105 -> 106
106 106 SPW > 0.290 class_4 105 -> 107
...
我不太确定为什么以下内容不起作用,因为这似乎是我所需要的
class_determiner_df.loc[1::2, 'Feature'] = None
class_determiner_df.loc[1::2, 'Scaled'] = None
class_determiner_df.loc[1::2, 'Indicator'] = '>'
# fillna() method of DataFrame scans rows from top, and when it finds a python None value (equivalent to numpy.NaN)
# it replaces the None value with the last significant value from the same column
class_determiner_df.fillna(method='ffill', inplace=True)
答案 0 :(得分:1)
这是因为loc
使用的是索引标签,而不是 position 。您可以使用DataFrame.reset_index
轻松解决此问题:
class_determiner_df.reset_index(inplace=True, drop=True)
# Adjust every other row
class_determiner_df.loc[1::2, 'Feature'] = None
class_determiner_df.loc[1::2, 'Scaled'] = None
class_determiner_df.loc[1::2, 'Indicator'] = '>'
# fillna() method of DataFrame scans rows from top, and when it finds a python None value (equivalent to numpy.NaN)
# it replaces the None value with the last significant value from the same column
class_determiner_df.fillna(method='ffill', inplace=True)