我有一个DataFrame如下:
df=pd.DataFrame({'variable':["A","A","B","B","C","D","E","E","E","F","F","G"],'weight':[2,2,0,0,1,3,5,5,5,0,0,4]})
Out[129]:
variable weight
0 A 2
1 A 2
2 B 0
3 B 0
4 C 1
5 D 3
6 E 5
7 E 5
8 E 5
9 F 0
10 F 0
11 G 4
我想根据variable
组创建新列,新列的值基于列weight
和 本身 。
在R中:我可以轻松地使用rowwise
中的dplyr
来获得所需的输出
library(dplyr)
test <-
data.frame(
variable = c("A","A","B","B","C","D","E","E","E","F","F","G"),
weight = c(2,2,0,0,1,3,5,5,5,0,0,4)
)
test%>%group_by(variable)%>%rowwise()%>%mutate(Var=ifelse (weight==2,1,ifelse(.Last.value ==1|weight>1,0,NA)))
和预期的结果如下:
variable weight Var
<fctr> <dbl> <dbl>
1 A 2 1
2 A 2 1
3 B 0 NA
4 B 0 NA
5 C 1 NA
6 D 3 0
7 E 5 0
8 E 5 0
9 E 5 0
10 F 0 NA
11 F 0 NA
12 G 4 0
我如何在Python中实现这一目标?
编辑:上面的R方法也错了
我的方法:
l1=[]
for i in df.variable.unique():
temp=df.loc[df.variable==i]
l2 = []
for j in range(len(temp)):
print(i,j)
if temp.iloc[j,1]<=2 :
l2.append(1)
elif temp.iloc[j,1]>2 and j==0:
l2.append('ERROR')
elif temp.iloc[j,1]>2 and j > 0 :
if l2[j - 1] == 1:
l2.append(1)
else:
l2.append(0)
print(l2)
l1.extend(l2)
df['NEW']=l1
数据输入
df=pd.DataFrame({'variable':["A","A","B","B","C","D","E","E","E","F","F","G"],'weight':[2,2,0,0,1,3,3,5,5,0,0,4]})
out put
df['NEW']=l1
df
Out[232]:
variable weight NEW
0 A 2 1
1 A 2 1
2 B 0 1
3 B 0 1
4 C 1 1
5 D 3 ERROR
6 E 3 ERROR
7 E 5 0
8 E 5 0
9 F 0 1
10 F 0 1
11 G 4 ERROR
答案 0 :(得分:1)
No Groupby!
如果我正确地解释了这一点,请告诉我。
选项1
df.assign(Var=df.weight.eq(2).mul(1).mask(df.weight.le(1)))
variable weight Var
0 A 2 1.0
1 A 2 1.0
2 B 0 NaN
3 B 0 NaN
4 C 1 NaN
5 D 3 0.0
6 E 5 0.0
7 E 5 0.0
8 E 5 0.0
9 F 0 NaN
10 F 0 NaN
11 G 4 0.0
选项2
df.assign(Var=np.array([np.nan, 1, 0])[np.searchsorted([1, 2], df.weight.values)])
variable weight Var
0 A 2 1.0
1 A 2 1.0
2 B 0 NaN
3 B 0 NaN
4 C 1 NaN
5 D 3 0.0
6 E 5 0.0
7 E 5 0.0
8 E 5 0.0
9 F 0 NaN
10 F 0 NaN
11 G 4 0.0
选项3
df.assign(Var=np.array([1, 0, np.nan])[np.sign(df.weight.values - 2)])
variable weight Var
0 A 2 1.0
1 A 2 1.0
2 B 0 NaN
3 B 0 NaN
4 C 1 NaN
5 D 3 0.0
6 E 5 0.0
7 E 5 0.0
8 E 5 0.0
9 F 0 NaN
10 F 0 NaN
11 G 4 0.0