我想将pandas数据框拆分为多个组,以便分别处理每个组。我的“ value.csv”文件包含以下数字
num tID y x height width
2 0 0 0 1 16
2 1 1 0 1 16
5 0 1 0 1 16
5 1 0 0 1 8
5 2 0 8 1 8
6 0 0 0 1 16
6 1 1 0 1 8
6 2 1 8 1 8
2 0 0 0 1 16
2 1 1 0 1 16
5 0 1 0 1 16
5 1 0 0 1 8
5 2 0 8 1 8
6 0 0 0 1 16
6 1 1 0 1 8
6 2 1 8 1 8
我想根据0
列中tID
的起始值拆分数据,就像前4个分隔一样。
第一:
2 0 0 0 1 16
2 1 1 0 1 16
第二:
5 0 1 0 1 16
5 1 0 0 1 8
5 2 0 8 1 8
第三:
6 0 0 0 1 16
6 1 1 0 1 8
6 2 1 8 1 8
第四:
2 0 0 0 1 16
2 1 1 0 1 16
为此,我尝试使用是否有效的想法(如果没有成功的话)将其拆分?
import pandas as pd
statQuality = 'value.csv'
df = pd.read_csv(statQuality, names=['num','tID','y','x','height','width'])
df2 = df.copy()
df2.drop(['num'], axis=1, inplace=True)
x = []
for index, row in df2.iterrows():
if row['tID'] == 0:
x = []
x.append(row)
print(x)
else:
x.append(row)
答案 0 :(得分:1)
使用:
#create groups by consecutive values
s = df['num'].ne(df['num'].shift()).cumsum()
#create helper count Series for duplicated groups like `2_0`, `2_1`...
g = s.groupby(df['num']).transform(lambda x: x.factorize()[0])
#dictionary of DataFrames
d = {'{}_{}'.format(i,j): v.drop('num', axis=1) for (i, j), v in df.groupby(['num', g])}
print (d)
{'2_0': tID y x height width
0 0 0 0 1 16
1 1 1 0 1 16, '2_1': tID y x height width
8 0 0 0 1 16
9 1 1 0 1 16, '5_0': tID y x height width
2 0 1 0 1 16
3 1 0 0 1 8
4 2 0 8 1 8, '5_1': tID y x height width
10 0 1 0 1 16
11 1 0 0 1 8
12 2 0 8 1 8, '6_0': tID y x height width
5 0 0 0 1 16
6 1 1 0 1 8
7 2 1 8 1 8, '6_1': tID y x height width
13 0 0 0 1 16
14 1 1 0 1 8
15 2 1 8 1 8}