我正在清理并重新构建数据框。
我有以下数据框:
data= pd.DataFrame()
data['ID'] = [1,1,1,1,1,2,2,2,2,2]
data ['EventSecond'] = [1.5,2,2.5,3,3.8,4,4.8,6,7,8,]
data ['P1'] = ['A','B','C','D','E','F','A','D','E','G']
data ['Code'] = [12,13,16,9,9,0,4,13,14,16]
data ['status'] =['Pass','Pass','Pass','Pass','Pass','Pass','shot','shot','Pass','Pass']
data ['Accuracy']= ['Accurate','Accurate','Accurate','Accurate','Accurate','Not Accurate','Accurate','Accurate','Accurate','Not Accurate']
在此数据框中,我具有ID,Eventsecond等。 我想做的是创建一个新列 P2 ,如果列 Accuracy 的元素是 Accurate ,则该列包含P1列下一行的元素>。一件事是,如果下面的ID列不同,我不会从行下面获取元素,而将其保留为空白 如果准确度为不准确,则此行将保留空白。
问题补充
我只会采用状态列的值为通过的行。
预期结果如下:
有人可以建议吗? 谢谢,
Zep。
答案 0 :(得分:2)
IIUC,您需要groupby
和transform
:
mask = (data['status'].isin(['Pass','pass']))
data.loc[mask,'P2'] = data[mask].groupby('ID')['P1'].transform(lambda x: x.shift(-1))
data.loc[data['Accuracy']=='Not Accurate','P2'] = np.nan
或仅使用过滤器:
mask = (data['status'].isin(['Pass','pass']))
data.loc[mask,'P2'] = data.loc[mask,'P1'].shift(-1)
mask2 = data['ID'].ne(data['ID'].shift(-1))|data['status'].eq('shot')|data['Accuracy'].eq('Not Accurate')
data.loc[mask2,'P2'] = ''
print(data)
ID EventSecond P1 Code status Accuracy P2
0 1 1.5 A 12 Pass Accurate B
1 1 2.0 B 13 Pass Accurate C
2 1 2.5 C 16 Pass Accurate D
3 1 3.0 D 9 Pass Accurate E
4 1 3.8 E 9 Pass Accurate NaN
5 2 4.0 F 0 Pass Not Accurate NaN
6 2 4.8 A 4 shot Accurate NaN
7 2 6.0 D 13 shot Accurate NaN
8 2 7.0 E 14 pass Accurate G
9 2 8.0 G 16 pass Not Accurate NaN
如果您使用空白,则使用NAN代替NAN:
print(data.fillna(''))
ID EventSecond P1 Code status Accuracy P2
0 1 1.5 A 12 Pass Accurate B
1 1 2.0 B 13 Pass Accurate C
2 1 2.5 C 16 Pass Accurate D
3 1 3.0 D 9 Pass Accurate E
4 1 3.8 E 9 Pass Accurate
5 2 4.0 F 0 Pass Not Accurate
6 2 4.8 A 4 shot Accurate
7 2 6.0 D 13 shot Accurate
8 2 7.0 E 14 pass Accurate G
9 2 8.0 G 16 pass Not Accurate
答案 1 :(得分:1)
因此,首先我将根据P1中的shift
创建P2,然后根据您的条件创建一个mask
,并用loc
来空白更改P2中的值,例如:
data['P2'] = data['P1'].shift(-1)
mask = ((data.Accuracy == 'Not Accurate') |
(data.status =='shot') |
(data.ID != data.ID.shift(-1)))
data.loc[mask,'P2'] = ''
print (data)
ID EventSecond P1 Code status Accuracy P2
0 1 1.5 A 12 Pass Accurate B
1 1 2.0 B 13 Pass Accurate C
2 1 2.5 C 16 Pass Accurate D
3 1 3.0 D 9 Pass Accurate E
4 1 3.8 E 9 Pass Accurate
5 2 4.0 F 0 Pass Not Accurate
6 2 4.8 A 4 shot Accurate
7 2 6.0 D 13 shot Accurate
8 2 7.0 E 14 pass Accurate G
9 2 8.0 G 16 pass Not Accurate
编辑:您甚至可以使用numpy.where
这样
import numpy as np
data['P2'] = np.where(mask, '', data.P1.shift(-1))