Question

我是 Python 新手，我的要求是找到一个干净的代码，用于根据一组行索引将数据帧拆分为不同的数据帧。

Dataframe Module1 有超过一百万行。需要按照下面的索引号从0开始分割。

Int64Index([55893, 122056, 180227, 234314], dtype='int64')

即第一个溢出的数据帧应该是 0 到 55892，下一个是从 55893 到 122055 等

这是我的代码，问题在于从 234314 到末尾的最后一个数据帧。我不确定如何在循环中实现它。

  start=0
  Module=[]
  for ele in indexing:
      Module.append(Module1[start:ele])
      start=ele
  Module.append(Module1[start:])
  print(Module)

但是，我想为此代码获得一个更简洁的解决方案。

Answer 1

您可以使用 iloc 和循环，因为 iloc 它将数据帧拆分为所需长度的子数据帧。循环中的预期行为应该类似于：

step = 55893

df_1 = Module1.iloc[:step, :]
df_2 = Module1.iloc[step:(step*2), :]
df_3 = Module1.iloc[(step*2):(step*3), :]
...
df_n = Module1.iloc[(step*(n-1)):(step*n), :]

P.S：查看 numpy's split 以获取替代方案。

根据一组行索引拆分数据帧

1 个答案: