Question

我目前正在使用Pandas和Python来处理大部分重复性任务，我需要完成我的硕士论文。此时，我已经编写了一些代码（在堆栈溢出的帮助下），根据一个文件中的某些事件日期，找到一个开始日期和结束日期，用作另一个文件中的日期范围。然后定位这些日期并将其附加到空列表中，然后我可以将其输出到excel。但是，使用下面的代码，我得到一个包含5列和400.000 +行的数据帧（这基本上就是我想要的），但不是我想要输出的数据如何excel。以下是我的代码：

end_date = pd.DataFrame(data=(df_sample['Date']-pd.DateOffset(days=2)))
start_date = pd.DataFrame(data=(df_sample['Date']-pd.offsets.BDay(n=252)))

merged_dates = pd.merge(end_date,start_date,left_index=True,right_index=True)

ff_factors = []

for index, row in merged_dates.iterrows():
    time_range= (df['Date'] > row['Date_y']) & (df['Date'] <= row['Date_x'])
    df_factor = df.loc[time_range]
    ff_factors.append(df_factor)

appended_data = pd.concat(ff_factors, axis=0)

我需要并排数据为5列和250行（列是变量标识符），因此当将其输出到excel时，例如列A-D，然后每列250行。然后需要对列E-H重复此操作，依此类推。使用iloc，我可以使用appended_data.iloc[0:250]找到250个观测值，包括5列和250行，然后将其输出到excel。

我可以通过任何方式自动化该过程，以便在选择前250并将其输出到Excel后，它会选择下一个250并将其输出到前250个，依此类推吗？

我希望以上内容准确而清晰，否则我很乐意详细说明！

编辑：

上图说明了输出到excel时得到的结果; 5列和407.764行。我需要的是通过以下方式进行拆分：

第二张图片说明了我需要如何拆分总样本。前五列和相应的250行需要作为第二张图片。当我使用iloc [250：500]进行下一次拆分时，我将得到接下来的250行，需要在最初的五列之后添加，依此类推。

Answer 1

解决问题的最佳猜测是尝试循环，直到计数器大于长度，所以

template<typename T>
class Singleton
{
public:
    static T& getInstance () {return ms_instance;}
    static T ms_instance;
};

template<typename T>
T Singleton<T>::ms_instance;

Answer 2

您可以使用np.reshape的组合执行此操作，可以使其在各个列上按照需要运行，并且应该比通过行的循环快得多，pd.concat，加入它重新组合的数据帧：

def reshape_appended(df, target_rows, pad=4):
    df = df.copy()  # don't modify in-place
    # below line adds strings, '0000',...,'0004' to the column names
    # this ensures sorting the columns preserves the order
    df.columns = [str(i).zfill(pad)+df.columns[i] for i in range(len(df.columns))]
    #target number of new columns per column in df
    target_cols = len(df.index)//target_rows
    last_group = pd.DataFrame()
    # below conditional fires if there will be leftover rows - % is mod
    if len(df.index)%target_rows != 0:
        last_group = df.iloc[-(len(df.index)%target_rows):].reset_index(drop=True)
        df = df.iloc[:-(len(df.index)%target_rows)]  # keep rows that divide nicely
    #this is a large list comprehension, that I'll elaborate on below
    groups = [pd.DataFrame(df[col].values.reshape((target_rows, target_cols),
                                                  order='F'),
                           columns=[str(i).zfill(pad)+col for i in range(target_cols)])
              for col in df.columns]
    if not last_group.empty:  # if there are leftover rows, add them back 
        last_group.columns = [pad*'9'+col for col in last_group.columns]
        groups.append(last_group)
    out = pd.concat(groups, axis=1).sort_index(axis=1)
    out.columns = out.columns.str[2*pad:]  # remove the extra characters in the column names
    return out

last_group负责处理不均匀分配到250个集合中的任何行。使用列名称进行操作会强制执行正确的排序顺序。

df[col].values.reshape((target_rows, target_cols), order='F')

将col的{{1}}列中的值重新整形为元组df指定的形状，使用Fortran使用的排序，由(target_rows, target_cols)表示。

只是为这些列命名，并且随后都要建立正确的排序。

例如：

columns=[str(i).zfill(pad)+col for i in range(target_cols)]

使用熊猫自动切片生产

2 个答案: