在对数据框的行进行排序并更改每隔一行的值时出现问题

时间:2019-05-27 12:31:40

标签: python pandas dataframe

我一直在研究数据框,尝试首先按列的值对其进行排序。然后更改某些列的每隔一行的值。要对我正在做的列进行排序:

df['key'] = df['Direction'].apply(lambda x: x.split()[0])
# Take the second number to ensure the order is kept
df['key2'] = df['Direction'].apply(lambda x: x.split()[2])

class_determiner_df = df.sort_values(['key', 'key2'])

这可以按照我在Sort the rows of a data frame上一个问题中的预期对列进行排序。

然后我得到以下数据框:

         Node               Feature Indicator  Scaled     Class    Direction
    0       0                    km        <=   0.181   class_4      0 -> 1 
    201   201                  gini         =   0.000   class_5    0 -> 202 
    1       1                   WPS        <=   0.074   class_5      1 -> 2 
    64     64                  gini         =   0.000   class_4     1 -> 65 
    10     10              funktion        <=   0.500   class_2    10 -> 11 
    17     17                  gini         =   0.000   class_5    10 -> 18 
    100   100                   SPW        <=   0.282   class_5  100 -> 101 
    101   101                  gini         =   0.000   class_5  100 -> 102 
    102   102              words_nb        <=   0.322   class_3  102 -> 103 
    123   123                  gini         =   0.496   class_2  102 -> 124 
    103   103              words_nb        <=   0.125   class_2  103 -> 104 
    104   104                  gini         =   0.000   class_2  103 -> 105 
    105   105                   SPW        <=   0.290   class_4  105 -> 106 
    106   106                  gini         =   0.000   class_4  105 -> 107 
    107   107              words_nb        <=   0.197   class_3  107 -> 108 
    116   116                  gini         =   0.000   class_4  107 -> 117 
    108   108                   SPW        <=   0.330   class_3  108 -> 109 
    109   109                  gini         =   0.000   class_3  108 -> 110 
    11     11           auftragnehm        <=   0.500   class_2    11 -> 12 
    16     16                  gini         =   0.000   class_2    11 -> 17 
    110   110             Comp_conj        <=   0.125   class_3  110 -> 111 
    115   115                  gini         =   0.000   class_4  110 -> 116 
    111   111              words_nb        <=   0.138   class_3  111 -> 112 
    112   112                  gini         =   0.000   class_3  111 -> 113 
    113   113           weird_words        <=   0.167   class_3  113 -> 114 
    114   114                  gini         =   0.000   class_3  113 -> 115 
    117   117              polarity        <=   0.175   class_2  117 -> 118 
    118   118                  gini         =   0.000   class_2  117 -> 119 
    119   119          Aux_Start_no        <=   0.500   class_3  119 -> 120 
    120   120                  gini         =   0.000   class_3  119 -> 121 
    ..    ...                   ...       ...     ...       ...          ...

然后我尝试使df ['feature']和df ['value']的第二行与上面的行相等,并且使df ['indicator']等于'>'

我从以下答案中选取了以下内容:Adjust every other row of a data frame

 # Adjust every other row
class_determiner_df.loc[1::2, 'Feature'] = None
class_determiner_df.loc[1::2, 'Scaled'] = None
class_determiner_df.loc[1::2, 'Indicator'] = '>'
# fillna() method of DataFrame scans rows from top, and when it finds a python None value (equivalent to numpy.NaN) 
# it replaces the None value with the last significant value from the same column
class_determiner_df.fillna(method='ffill', inplace=True)

这会产生以下不正确的数据帧:


         Node             Feature Indicator  Scaled     Class    Direction
    0       0                  km        <=   0.181   class_4      0 -> 1 
    201   201                gini         =   0.000   class_5    0 -> 202 
    1       1                gini         >   0.000   class_5      1 -> 2 
    64     64                gini         =   0.000   class_4     1 -> 65 
    10     10                gini         >   0.000   class_2    10 -> 11 
    17     17                gini         =   0.000   class_5    10 -> 18 
    100   100                gini         >   0.000   class_5  100 -> 101 
    101   101                gini         =   0.000   class_5  100 -> 102 
    102   102                gini         >   0.000   class_3  102 -> 103 
    123   123                gini         =   0.496   class_2  102 -> 124 
    103   103                gini         >   0.496   class_2  103 -> 104 
    104   104                gini         =   0.000   class_2  103 -> 105 
    105   105                gini         >   0.000   class_4  105 -> 106 
    106   106                gini         =   0.000   class_4  105 -> 107 
    107   107                gini         >   0.000   class_3  107 -> 108 
    116   116                gini         =   0.000   class_4  107 -> 117 
    108   108                gini         >   0.000   class_3  108 -> 109 
    109   109                gini         =   0.000   class_3  108 -> 110 
    11     11                gini         >   0.000   class_2    11 -> 12 
    16     16                gini         =   0.000   class_2    11 -> 17 
    110   110                gini         >   0.000   class_3  110 -> 111 
    115   115                gini         =   0.000   class_4  110 -> 116 
    111   111                gini         >   0.000   class_3  111 -> 112 
    112   112                gini         =   0.000   class_3  111 -> 113 
    113   113                gini         >   0.000   class_3  113 -> 114 
    114   114                gini         =   0.000   class_3  113 -> 115 
    117   117                gini         >   0.000   class_2  117 -> 118 
    118   118                gini         =   0.000   class_2  117 -> 119 
    119   119                gini         >   0.000   class_3  119 -> 120 
    120   120                gini         =   0.000   class_3  119 -> 121 
    ..    ...                 ...       ...     ...       ...          ...

第二行'gini'替换了其后的每一行,是否有更好的方法来确保数据帧看起来像这样:

        Node               Feature Indicator  Scaled     Class    Direction
    0       0                    km        <=   0.181   class_4      0 -> 1 
    201   201                    km         >   0.181   class_5    0 -> 202 
    1       1                   WPS        <=   0.074   class_5      1 -> 2 
    64     64                   WPS         >   0.074   class_4     1 -> 65 
    10     10              funktion        <=   0.500   class_2    10 -> 11 
    17     17              function         >   0.500   class_5    10 -> 18 
    100   100                   SPW        <=   0.282   class_5  100 -> 101 
    101   101                   SPW         >   0.282   class_5  100 -> 102 
    102   102              words_nb        <=   0.322   class_3  102 -> 103 
    123   123              words_nb         >   0.322   class_2  102 -> 124 
    105   105                   SPW        <=   0.290   class_4  105 -> 106 
    106   106                   SPW         >   0.290   class_4  105 -> 107 
    ...

我不太确定为什么以下内容不起作用,因为这似乎是我所需要的

    class_determiner_df.loc[1::2, 'Feature'] = None
    class_determiner_df.loc[1::2, 'Scaled'] = None
    class_determiner_df.loc[1::2, 'Indicator'] = '>'
    # fillna() method of DataFrame scans rows from top, and when it finds a python None value (equivalent to numpy.NaN) 
    # it replaces the None value with the last significant value from the same column
    class_determiner_df.fillna(method='ffill', inplace=True)

1 个答案:

答案 0 :(得分:1)

这是因为loc使用的是索引标签,而不是 position 。您可以使用DataFrame.reset_index轻松解决此问题:

class_determiner_df.reset_index(inplace=True, drop=True)

# Adjust every other row
class_determiner_df.loc[1::2, 'Feature'] = None
class_determiner_df.loc[1::2, 'Scaled'] = None
class_determiner_df.loc[1::2, 'Indicator'] = '>'
# fillna() method of DataFrame scans rows from top, and when it finds a python None value (equivalent to numpy.NaN) 
# it replaces the None value with the last significant value from the same column
class_determiner_df.fillna(method='ffill', inplace=True)