考虑一个python数据帧
A B C
1 random imp1
2 random imp2
5 random imp3
1 yes ---
2 yes ---
3 no ---
4 no ---
5 yes ---
每当列B的值为yes时,我想获取A的值。并且最终对于A的那些值,当这些值在A中出现时,我想要C。所以在这种情况下,我最终想要imp1,imp2和IMP3。
这是否有一种优雅的方式。
答案 0 :(得分:2)
您可以先使用boolean indexing
loc
,然后使用duplicated
,最后使用isin
过滤值a
:
a = df.loc[df['B'] == 'yes', 'A']
df = df.drop_duplicates('A')
df = df.loc[df['A'].isin(a), 'C']
print (df)
0 imp1
1 imp2
2 imp3
Name: C, dtype: object
<强>计时强>:
np.random.seed(123)
N = 1000000
df = pd.DataFrame({'B': np.random.choice(['yes','no', 'a', 'b', 'c'], N),
'A':np.random.randint(1000, size=N),
'C':np.random.randint(1000, size=N)})
print (df)
print (df[df.A.isin(df[df.B == 'yes'].A)].drop_duplicates('A').C)
print (df[df.A.isin(df[df.B == 'yes'].drop_duplicates('A').A)].C)
def fjez(df):
a = df.loc[df['B'] == 'yes', 'A']
df = df.drop_duplicates('A')
return df.loc[df['A'].isin(a), 'C']
def fpir(df):
a = df.A.values
b = df.B.values == 'yes'
d = df.drop_duplicates('A')
return d.C[np.in1d(d.A.values, a[b])]
print (fjez(df))
print (fpir(df))
In [296]: %timeit (df[df.A.isin(df[df.B == 'yes'].A)].drop_duplicates('A').C)
1 loop, best of 3: 226 ms per loop
In [297]: %timeit (df[df.A.isin(df[df.B == 'yes'].drop_duplicates('A').A)].C)
1 loop, best of 3: 185 ms per loop
In [298]: %timeit (fjez(df))
10 loops, best of 3: 156 ms per loop
In [299]: %timeit (fpir(df))
10 loops, best of 3: 87.1 ms per loop
答案 1 :(得分:2)
让我们使用这个单行:
df[df.A.isin(df[df.B == 'yes'].A)].drop_duplicates('A').C
输出:
0 imp1
1 imp2
2 imp3
Name: C, dtype: object
答案 2 :(得分:1)
这应该非常快
a = df.A.values
b = df.B.values == 'yes'
d = df.drop_duplicates('A')
d.C[np.in1d(d.A.values, a[b])]
0 imp1
1 imp2
2 imp3
Name: C, dtype: object
超越顶级方法。比我的其他方法快50%左右。
from numba import njit
@njit
def proc(f, m):
mx = f.max() + 1
a = [False] * mx
b = [0] * mx
z = [0] * f.size
for i in range(f.size):
x = f[i]
y = m[i]
b[x] += 1
z[i] = b[x]
a[x] = a[x] or y
return np.array(z) == 1, np.array(a)[f]
df.C[np.logical_and(*proc(pd.factorize(df.A.values)[0], df.B.values == 'yes'))]
0 imp1
1 imp2
2 imp3
Name: C, dtype: object