这是我之前提出的问题,但我以错误的方式解释了,所以我将再次提出一个新问题。感谢您的帮助和时间!
数据输入:
df=pd.DataFrame({'variable':["A","A","B","B","C","D","E","E","E","F","F","G"],'weight':[2,2,0,0,1,3,3,1,5,0,0,4]})
df
Out[447]:
variable weight
0 A 2
1 A 2
2 B 0
3 B 0
4 C 1
5 D 3
6 E 3
7 E 1# If value more than 2 , out put should be 0
8 E 5
9 F 0
10 F 0
11 G 4
预期产出:
df
Out[449]:
variable weight NEW
0 A 2 1
1 A 2 1
2 B 0 1
3 B 0 1
4 C 1 1
5 D 3 ERROR
6 E 3 ERROR
7 E 1 1
8 E 5 1
9 F 0 1
10 F 0 1
11 G 4 ERROR
我的方法截至目前(丑陋..):
l1=[]
for i in df.variable.unique():
temp=df.loc[df.variable==i]
l2 = []
for j in range(len(temp)):
print(i,j)
if temp.iloc[j,1]<=2 :
l2.append(1)
elif temp.iloc[j,1]>2 and j==0:
l2.append('ERROR')
elif temp.iloc[j,1]>2 and j > 0 :
if l2[j - 1] == 1:
l2.append(1)
else:
l2.append(0)
print(l2)
l1.extend(l2)
df['NEW']=l1
我的问题在这里:
第一。如果我想使用groupby
,我如何在将来的计算中包含每个计算结果,以便在此处获取NEW
列。
第二。在pandas
中是否有R
ERROR
函数?
我将在这里解释一下这个条件:
1.如果重量小于2的值总是1
2.如果第一个体重值高于2,则应该返回df=pd.DataFrame({'variable':["A","A","B","B","C","D","E","E","E","F","F","G"],'weight':[2,2,0,0,1,3,3,9,5,0,0,4]})
3.如果前一个获得错误&#39;和权重值当前行大于2它将返回0
请将输入更改为:
[1,-2,3,-4]
答案 0 :(得分:1)
我不确定我是否从循环中正确理解了条件,但这看起来像
df['New'] = np.where((df['weight'] > 2) & (df['variable'] != df['variable'].shift(1)), 'ERROR', 1)
variable weight New
0 A 2 1
1 A 2 1
2 B 0 1
3 B 0 1
4 C 1 1
5 D 3 ERROR
6 E 3 ERROR
7 E 1 1
8 E 5 1
9 F 0 1
10 F 0 1
11 G 4 ERROR
答案 1 :(得分:1)
n = 2 # `Error` weight filter.
# Get boolean index of whether weight of first item in group is greater than `n`.
mask = df.loc[[idx[0] for idx in df.groupby('variable')['weight'].groups.values()], 'weight'].gt(n)
df = df.assign(New=1)
df.loc[mask[mask].index, 'New'] = 'ERROR'
>>> df
variable weight New
0 A 2 1
1 A 2 1
2 B 0 1
3 B 0 1
4 C 1 1
5 D 3 ERROR
6 E 3 ERROR
7 E 1 1
8 E 5 1
9 F 0 1
10 F 0 1
11 G 4 ERROR