熊猫:根据重复的标志分割df

时间:2020-04-26 12:56:04

标签: python python-3.x pandas dataframe

我有一个df,看起来像这样:

      HEADER1   HEADER2   HEADER3
0     Group1    Value2    Value3
1     Group2    Value4    Value5
4     Group1    Value6    Value7
5     Group2    Value8    Value9
6     TAIL1     TAIL2     TAIL3

标题和尾部将始终相同,我需要在所有拆分的dfs中坚持使用

因此,如果我们假设record中的df或一组有用信息是Group1 and Group2的每一组-那么此df将具有2组数据。

将其分开的最佳方法是什么?所以我们有2个df,看起来像这样:

      HEADER1   HEADER2   HEADER3
0     Group1    Value2    Value3
1     Group2    Value4    Value5
6     TAIL1     TAIL2     TAIL3
      HEADER1   HEADER2   HEADER3
4     Group1    Value6    Value7
5     Group2    Value8    Value9
6     TAIL1     TAIL2     TAIL3

可能会有任意数量的拆分,因此理想情况下,我想关注效率...感激蚂蚁的信息

  • 编辑*

如果我想扩展答案并将dfs转换为类似这样的内容:

{
    "headers": 

        {"SomeHeaderName": "Header1", "SomeOtherHeaderName": "Header2"}, 

    "groups": [
            "Group1": {"Value2": "GroupValue2",  "Value3": "GroupValue3"}, 
            "Group2": {"Value4": "GroupValue4",  "Value5": "GroupValue5"}
        ] 
    "trailer": 
        {"SomeTailName": "Tail1", "SomeOtherTailName": "Tail2"}
}

从已经存在的结构中提取密钥,然后仅将df条目作为值压缩它们

2 个答案:

答案 0 :(得分:2)

groupby使用列表推导,并在DataFrame.append之前添加最后一行:

#get last row
last = df.iloc[[-1]]
print (last)
  HEADER1 HEADER2 HEADER3
6   TAIL1   TAIL2   TAIL3

#get all rows without last
df1 = df.iloc[:-1]
#specify first value of group in first column
s = df1.iloc[:, 0].eq('Group1').cumsum()

a = [x.append(last, ignore_index=True) for i, x in df1.groupby(s)]
print (a)
[  HEADER1 HEADER2 HEADER3
0  Group1  Value2  Value3
1  Group2  Value4  Value5
2   TAIL1   TAIL2   TAIL3,   HEADER1 HEADER2 HEADER3
0  Group1  Value6  Value7
1  Group2  Value8  Value9
2   TAIL1   TAIL2   TAIL3]

答案 1 :(得分:0)

您可以这样做:

# create a list to store the dataframes
frames = []

last_group = ''
this_group = ''
# go over all rows in df.values
for row_id in range(0, len(df.values)):
    this_group = df.iloc[row_id][0]
    # leave the first row out of the comparison
    if(last_group != ''): 
        # if the last group was 1 and this is 2 then write both rows to the new dataset
        if(this_group == 'Group2' and last_group == 'Group1'): 
            # create a new empty dataset
            df_new_data = {
                'HEADER1': []
                , 'HEADER2': []
                , 'HEADER3': []
            }
            # add the match-rows
            df_new_data['HEADER1'].append(df.iloc[row_id-1][0])
            df_new_data['HEADER2'].append(df.iloc[row_id-1][1])
            df_new_data["HEADER3"].append(df.iloc[row_id-1][2])
            df_new_data['HEADER1'].append(df.iloc[row_id][0])
            df_new_data['HEADER2'].append(df.iloc[row_id][1])
            df_new_data['HEADER3'].append(df.iloc[row_id][2])
            # create new DataFrame from dataset
            frames.append(pd.DataFrame(df_new_data))
    # remember this value as 'last value'
    last_group = this_group

# show all new dataframes' shape
for frame in frames:
    print(frame.shape)