熊猫根据元组列表中的索引值拆分数据帧

时间:2020-02-24 17:59:06

标签: python pandas dataframe for-loop tuples

假设我有一个带有索引值的元组列表:

mapper= [(0,6),(9,13),(17,27)]

我有一个很大的 master_df ,我想根据上面列表中的元组索引值将其分成多个df。

mapper [0] [0]是起点,而mapper [0] [1]是终点。而且我有一个df名称列表。

df_list= ['df_1','df_2,'df_3']

我尝试了以下代码段,尝试根据 mapper

中的索引值填充多个df
for x in range(len(df_list)):
    df_list[x] = master_df[mapper[x][0]:mapper[x][1]]

但是它并不能解决我的设想。对我来说,理想的解决方案是基于列表中的元组索引值,对master_df进行三个单独的df拆分。

以下是我要完成的工作的一个示例:

master_df:
     Name    Role       Location
0    Gina    Assistance    NY
1    Jake    Officer       Brooklyn
2    Boyle   Detective     99
3    Scully  Assistance    NY
4    Diaz    Officer       Brooklyn
5    Hitchcock Detective     99
6    Amy    Assistance    NY
7    Terry    Officer       Brooklyn
8    Holt   Detective     99
9    Judy   Assistance    NY
10   Adrian Officer       Brooklyn

mapper = [(0,3),(3,6),(6,11)]
df_list = ['df_1','df_2','df_3']

寻求结果

df_1:
     Name    Role       Location
0    Gina    Assistance    NY
1    Jake    Officer       Brooklyn
2    Boyle   Detective     99

df_2:
     Name    Role       Location
3    Scully  Assistance    NY
4    Diaz    Officer       Brooklyn
5    Hitchcock Detective     99

df_3:
     Name    Role       Location
6    Amy    Assistance    NY
7    Terry    Officer       Brooklyn
8    Holt   Detective     99
9    Judy   Assistance    NY
10   Adrian Officer       Brooklyn

任何帮助/指导都值得赞赏!

1 个答案:

答案 0 :(得分:1)

您可以使用*解开元组,并将它们传递给范围函数,然后使用iloc[]获取这些索引:

df_list=[df.iloc[range(*i),:] for i in mapper]

[     Name        Role  Location
 0   Gina  Assistance        NY
 1   Jake     Officer  Brooklyn
 2  Boyle   Detective        99,
         Name        Role  Location
 3     Scully  Assistance        NY
 4       Diaz     Officer  Brooklyn
 5  Hitchcock   Detective        99,
      Name        Role  Location
 6      Amy  Assistance        NY
 7    Terry     Officer  Brooklyn
 8     Holt   Detective        99
 9     Judy  Assistance        NY
 10  Adrian     Officer  Brooklyn]

如果要为它们分配名称,则必须将其设置为字典(请参见How to create a variable number of variables

df_dict=dict(zip(df_list,[df.iloc[range(*i),:] for i in mapper]))

{'df_1':     Name        Role  Location
 0   Gina  Assistance        NY
 1   Jake     Officer  Brooklyn
 2  Boyle   Detective        99,
 'df_2':         Name        Role  Location
 3     Scully  Assistance        NY
 4       Diaz     Officer  Brooklyn
 5  Hitchcock   Detective        99,
 'df_3':       Name        Role  Location
 6      Amy  Assistance        NY
 7    Terry     Officer  Brooklyn
 8     Holt   Detective        99
 9     Judy  Assistance        NY
 10  Adrian     Officer  Brooklyn}