Pandas中的Rowwise

时间:2017-09-14 19:58:10

标签: python pandas

这是我之前提出的问题,但我以错误的方式解释了,所以我将再次提出一个新问题。感谢您的帮助和时间!

数据输入:

df=pd.DataFrame({'variable':["A","A","B","B","C","D","E","E","E","F","F","G"],'weight':[2,2,0,0,1,3,3,1,5,0,0,4]})
df
Out[447]: 
   variable  weight
0         A       2
1         A       2
2         B       0
3         B       0
4         C       1
5         D       3
6         E       3
7         E       1# If value more than 2 , out put should be 0
8         E       5
9         F       0
10        F       0
11        G       4

预期产出:

df
Out[449]: 
   variable  weight    NEW
0         A       2      1
1         A       2      1
2         B       0      1
3         B       0      1
4         C       1      1
5         D       3  ERROR
6         E       3  ERROR
7         E       1      1
8         E       5      1
9         F       0      1
10        F       0      1
11        G       4  ERROR

我的方法截至目前(丑陋..):

l1=[]
for i in df.variable.unique():
    temp=df.loc[df.variable==i]
    l2 = []
    for j in range(len(temp)):
        print(i,j)

        if temp.iloc[j,1]<=2 :
            l2.append(1)
        elif temp.iloc[j,1]>2 and j==0:
            l2.append('ERROR')
        elif temp.iloc[j,1]>2 and j > 0 :
            if l2[j - 1] == 1:
                l2.append(1)
            else:
                l2.append(0)
        print(l2)
    l1.extend(l2)
df['NEW']=l1

我的问题在这里:

第一。如果我想使用groupby,我如何在将来的计算中包含每个计算结果,以便在此处获取NEW列。

第二。在pandas中是否有R ERROR函数?

我将在这里解释一下这个条件:

1.如果重量小于2的值总是1

2.如果第一个体重值高于2,则应该返回df=pd.DataFrame({'variable':["A","A","B","B","C","D","E","E","E","F","F","G"],'weight':[2,2,0,0,1,3,3,9,5,0,0,4]})

3.如果前一个获得错误&#39;和权重值当前行大于2它将返回0

请将输入更改为:

[1,-2,3,-4]

2 个答案:

答案 0 :(得分:1)

我不确定我是否从循环中正确理解了条件,但这看起来像

df['New'] = np.where((df['weight'] > 2) & (df['variable'] != df['variable'].shift(1)), 'ERROR', 1)

    variable    weight  New
0   A           2       1
1   A           2       1
2   B           0       1
3   B           0       1
4   C           1       1
5   D           3       ERROR
6   E           3       ERROR
7   E           1       1
8   E           5       1
9   F           0       1
10  F           0       1
11  G           4       ERROR

答案 1 :(得分:1)

n = 2  # `Error` weight filter.
# Get boolean index of whether weight of first item in group is greater than `n`.
mask = df.loc[[idx[0] for idx in df.groupby('variable')['weight'].groups.values()], 'weight'].gt(n)
df = df.assign(New=1)
df.loc[mask[mask].index, 'New'] = 'ERROR'
>>> df
   variable  weight    New
0         A       2      1
1         A       2      1
2         B       0      1
3         B       0      1
4         C       1      1
5         D       3  ERROR
6         E       3  ERROR
7         E       1      1
8         E       5      1
9         F       0      1
10        F       0      1
11        G       4  ERROR