我的数据框df
的一部分如下所示。这就是我使用df = df.drop_duplicates('months_to_maturity')
后的看法。但是,现在,对于具有相同months_to_maturity
的每一行,我希望保留number_of_rows_for_each_maturity
许多具有此特定成熟度的行。
months_to_maturity orig_iss_dt \
1 6 2015-06-25 00:00:00.0
2 12 2015-06-25 00:00:00.0
3 18 2015-06-30 00:00:00.0
4 24 2015-06-15 00:00:00.0
5 30 2015-06-30 00:00:00.0
maturity_dt pay_freq_cd coupon closing_price FACE_VALUE
1 2015-12-24 00:00:00.0 NaN 0.000 99.960889 100
2 2016-06-23 00:00:00.0 NaN 0.000 99.741444 100
3 2017-06-30 00:00:00.0 2 0.625 99.968750 100
4 2018-06-15 00:00:00.0 2 1.125 100.390625 100
5 2020-06-30 00:00:00.0 2 1.625 99.984375 100
我使用下面的代码执行此操作,其中pairwise(df.iterrows())
给出数据帧的当前和下一行。 我的问题是我正在从包含600,000行的Excel文档中读取数据框,因此想知道是否有更好的方法来执行此操作。
number_of_rows_for_each_maturity = number_of_columns_and_rows/60
count = 0
for (i1, row1), (i2, row2) in pairwise(df.iterrows()):
if row1['months_to_maturity'] == row2['months_to_maturity'] and count < number_of_rows_for_each_maturity + 1:
count = count + 1
if row1['months_to_maturity'] == row2['months_to_maturity'] and count == number_of_rows_for_each_maturity + 1:
df.drop(df.index[i1])
if row1['months_to_maturity'] != row2['months_to_maturity']:
count = 0
谢谢