仅在Dataframe

时间:2015-07-06 18:59:35

标签: python pandas

我的数据框df的一部分如下所示。这就是我使用df = df.drop_duplicates('months_to_maturity')后的看法。但是,现在,对于具有相同months_to_maturity的每一行,我希望保留number_of_rows_for_each_maturity许多具有此特定成熟度的行。

   months_to_maturity                      orig_iss_dt  \
1                   6                    2015-06-25 00:00:00.0   
2                  12                    2015-06-25 00:00:00.0   
3                  18                    2015-06-30 00:00:00.0   
4                  24                    2015-06-15 00:00:00.0   
5                  30                    2015-06-30 00:00:00.0   

             maturity_dt  pay_freq_cd  coupon  closing_price  FACE_VALUE  
1  2015-12-24 00:00:00.0          NaN   0.000      99.960889         100  
2  2016-06-23 00:00:00.0          NaN   0.000      99.741444         100  
3  2017-06-30 00:00:00.0            2   0.625      99.968750         100  
4  2018-06-15 00:00:00.0            2   1.125     100.390625         100  
5  2020-06-30 00:00:00.0            2   1.625      99.984375         100  

我使用下面的代码执行此操作,其中pairwise(df.iterrows())给出数据帧的当前和下一行。 我的问题是我正在从包含600,000行的Excel文档中读取数据框,因此想知道是否有更好的方法来执行此操作。

number_of_rows_for_each_maturity = number_of_columns_and_rows/60
count = 0
        for (i1, row1), (i2, row2) in pairwise(df.iterrows()):
            if row1['months_to_maturity'] == row2['months_to_maturity'] and count < number_of_rows_for_each_maturity + 1:
                count = count + 1
            if row1['months_to_maturity'] == row2['months_to_maturity'] and count == number_of_rows_for_each_maturity + 1:
                df.drop(df.index[i1])
            if row1['months_to_maturity'] != row2['months_to_maturity']:
                count = 0

谢谢

0 个答案:

没有答案