我有一个df
,看起来像这样:
HEADER1 HEADER2 HEADER3
0 Group1 Value2 Value3
1 Group2 Value4 Value5
4 Group1 Value6 Value7
5 Group2 Value8 Value9
6 TAIL1 TAIL2 TAIL3
标题和尾部将始终相同,我需要在所有拆分的dfs中坚持使用
因此,如果我们假设record
中的df
或一组有用信息是Group1 and Group2
的每一组-那么此df
将具有2组数据。
将其分开的最佳方法是什么?所以我们有2个df,看起来像这样:
HEADER1 HEADER2 HEADER3
0 Group1 Value2 Value3
1 Group2 Value4 Value5
6 TAIL1 TAIL2 TAIL3
HEADER1 HEADER2 HEADER3
4 Group1 Value6 Value7
5 Group2 Value8 Value9
6 TAIL1 TAIL2 TAIL3
可能会有任意数量的拆分,因此理想情况下,我想关注效率...感激蚂蚁的信息
如果我想扩展答案并将dfs转换为类似这样的内容:
{
"headers":
{"SomeHeaderName": "Header1", "SomeOtherHeaderName": "Header2"},
"groups": [
"Group1": {"Value2": "GroupValue2", "Value3": "GroupValue3"},
"Group2": {"Value4": "GroupValue4", "Value5": "GroupValue5"}
]
"trailer":
{"SomeTailName": "Tail1", "SomeOtherTailName": "Tail2"}
}
从已经存在的结构中提取密钥,然后仅将df条目作为值压缩它们
答案 0 :(得分:2)
对groupby
使用列表推导,并在DataFrame.append
之前添加最后一行:
#get last row
last = df.iloc[[-1]]
print (last)
HEADER1 HEADER2 HEADER3
6 TAIL1 TAIL2 TAIL3
#get all rows without last
df1 = df.iloc[:-1]
#specify first value of group in first column
s = df1.iloc[:, 0].eq('Group1').cumsum()
a = [x.append(last, ignore_index=True) for i, x in df1.groupby(s)]
print (a)
[ HEADER1 HEADER2 HEADER3
0 Group1 Value2 Value3
1 Group2 Value4 Value5
2 TAIL1 TAIL2 TAIL3, HEADER1 HEADER2 HEADER3
0 Group1 Value6 Value7
1 Group2 Value8 Value9
2 TAIL1 TAIL2 TAIL3]
答案 1 :(得分:0)
您可以这样做:
# create a list to store the dataframes
frames = []
last_group = ''
this_group = ''
# go over all rows in df.values
for row_id in range(0, len(df.values)):
this_group = df.iloc[row_id][0]
# leave the first row out of the comparison
if(last_group != ''):
# if the last group was 1 and this is 2 then write both rows to the new dataset
if(this_group == 'Group2' and last_group == 'Group1'):
# create a new empty dataset
df_new_data = {
'HEADER1': []
, 'HEADER2': []
, 'HEADER3': []
}
# add the match-rows
df_new_data['HEADER1'].append(df.iloc[row_id-1][0])
df_new_data['HEADER2'].append(df.iloc[row_id-1][1])
df_new_data["HEADER3"].append(df.iloc[row_id-1][2])
df_new_data['HEADER1'].append(df.iloc[row_id][0])
df_new_data['HEADER2'].append(df.iloc[row_id][1])
df_new_data['HEADER3'].append(df.iloc[row_id][2])
# create new DataFrame from dataset
frames.append(pd.DataFrame(df_new_data))
# remember this value as 'last value'
last_group = this_group
# show all new dataframes' shape
for frame in frames:
print(frame.shape)