pandas dataframe找到第n个非isnull行

时间:2016-03-04 15:25:02

标签: python numpy pandas

我想知道pandas数据帧中有多少点,其中index是我需要的一系列日期,以便在执行dropna()之后得到X点。我想要最新的积分。例如:

window = 504
s1 = pd.DataFrame(stuff)
len(s1.index) --> 600
dropped_series = s1.dropna()
len(dropped_series.index) --> 480
diff_points_count = len(s1.index) - len(dropped_series.index)
final_series = s1.tail(window + diff_points_count).dropna() 

- > len(final_series.index)不一定等于窗口。取决于NaN的位置。

我需要它来工作,其中s1是pandas.Series或pandas.DataFrame

1 个答案:

答案 0 :(得分:0)

这是我的解决方案,但我确信有一种更优雅的方式:

    all_series_df = pd.concat([harmonized_series_set[i] for i in series_indices], axis=1)
    all_series_df['is_valid'] = all_series_df.apply(lambda x: 0 if np.any(np.isnan(x)) else 1, raw=True, axis=1)
    valid_point_count = all_series_df['is_valid'].sum()
    all_series_df['count_valid'] = valid_point_count - all_series_df['is_valid'].cumsum() + 1
    matching_row_array = all_series_df.loc[all_series_df['count_valid'] == (window + output_length - 1)]
    matching_row_index = 0
    if isinstance(matching_row_array, pd.DataFrame) and len(matching_row_array.index) > 0:
        matching_row_index = all_series_df.index.get_loc(matching_row_array.index[0])
    tail_amount = len(all_series_df.index) - matching_row_index
    for i, arg in enumerate(args):
        if i in series_indices:
            tailed_series = harmonized_series_set[i].tail(tail_amount)
            harmonized_args.append(tailed_series)
        else:
            harmonized_args.append(arg)
    return tuple(harmonized_args)