python在数据框中移动数据

时间:2018-07-22 19:20:23

标签: python pandas

对于以下数据框,我想转移数据。按“ ID”和“系列”分组,应将Q,R和T列中的数据向下移动到“状态”为“结束”的行。

    data = pd.DataFrame({
'ID': ['A','A','A','B','B','B','B','C','C','C','C','C','D','D'], 
'Series': [1,1,1,1,1,2,2,1,1,2,2,2,1,1],
'Status': ['Begin','Begin','End','Begin','End','Begin','End','Begin','End','Begin','Begin','End','Begin','End'],
'Q':[9,'','',30,'',14,'',3,'',17,'','',1,''],
'R': ['',8,'','','','','','','','',7,'','',''],
'T': ['','',12,'',38,'',21,'',6,'','',35,'',5]
})

结果应如下:

result = pd.DataFrame({
'ID': ['A','A','A','B','B','B','B','C','C','C','C','C','D','D'], 
'Series': [1,1,1,1,1,2,2,1,1,2,2,2,1,1],
'Status': ['Begin','Begin','End','Begin','End','Begin','End','Begin','End','Begin','Begin','End','Begin','End'],
'Q':['','',9,'',30,'',14,'',3,'','',17,'',1],
'R': ['','',8,'','','','','','','','',7,'',''],
'T': ['','',12,'',38,'',21,'',6,'','',35,'',5]
})

2 个答案:

答案 0 :(得分:0)

首先让分组依据IDg,其中gBeginEnd的每组分组。

g = (data.Status == 'End').cumsum().shift(1).fillna(0)
l = lambda x: ''.join(x.astype(str))

df2 = data.groupby(['ID', g],as_index=False)[['Series','Status','Q', 'R', 'T']].\
          agg({'Series':'first', 'Status': 'last', 'Q': l, 'R': l, 'T':l})

然后只需将索引设置为匹配,然后使用loc

df2 = df2.set_index(['ID', 'Status', 'Series'])
data = data.set_index(['ID', 'Status', 'Series'])
data.loc[:, ['Q', 'R', 'T']] = df2[['Q', 'R', 'T']]
data = data.fillna("").reset_index()

    ID  Status  Series  Q   R   T
0   A   Begin   1           
1   A   Begin   1           
2   A   End     1       9   8   12
3   B   Begin   1           
4   B   End     1       30      38
5   B   Begin   2           
6   B   End     2       14      21
7   C   Begin   1           
8   C   End     1       3       6
9   C   Begin   2           
10  C   Begin   2           
11  C   End     2       17  7   35
12  D   Begin   1           
13  D   End     1       1       5

答案 1 :(得分:0)

使用GroupBy.transform + GroupBy.first查找第一个非NaN的值,然后通过maskduplicated删除重复的值:

cols = ['Q', 'R', 'T']

#repalce emty strings to NaNs
data[cols] = data[cols].astype(str).replace('', np.nan)
print (data)
   ID  Series Status    Q    R    T
0   A       1  Begin    9  NaN  NaN
1   A       1  Begin  NaN    8  NaN
2   A       1    End  NaN  NaN   12
3   B       1  Begin   30  NaN  NaN
4   B       1    End  NaN  NaN   38
5   B       2  Begin   14  NaN  NaN
6   B       2    End  NaN  NaN   21
7   C       1  Begin    3  NaN  NaN
8   C       1    End  NaN  NaN    6
9   C       2  Begin   17  NaN  NaN
10  C       2  Begin  NaN    7  NaN
11  C       2    End  NaN  NaN   35
12  D       1  Begin    1  NaN  NaN
13  D       1    End  NaN  NaN    5

g = data.groupby(['ID', 'Series'])
for c in cols:
    data[c] = g[c].transform('first')
print (data)
   ID  Series Status   Q    R   T
0   A       1  Begin   9    8  12
1   A       1  Begin   9    8  12
2   A       1    End   9    8  12
3   B       1  Begin  30  NaN  38
4   B       1    End  30  NaN  38
5   B       2  Begin  14  NaN  21
6   B       2    End  14  NaN  21
7   C       1  Begin   3  NaN   6
8   C       1    End   3  NaN   6
9   C       2  Begin  17    7  35
10  C       2  Begin  17    7  35
11  C       2    End  17    7  35
12  D       1  Begin   1  NaN   5
13  D       1    End   1  NaN   5

data[cols] = data[cols].mask(data.duplicated(['ID','Series'], keep='last'), '').fillna('')
print (data)

   ID  Series Status   Q  R   T
0   A       1  Begin           
1   A       1  Begin           
2   A       1    End   9  8  12
3   B       1  Begin           
4   B       1    End  30     38
5   B       2  Begin           
6   B       2    End  14     21
7   C       1  Begin           
8   C       1    End   3      6
9   C       2  Begin           
10  C       2  Begin           
11  C       2    End  17  7  35
12  D       1  Begin           
13  D       1    End   1      5