我有一个熊猫数据框:
col1 | col2 | col3 | col4 |
0. A | B | C | G|
1. I | J | S | D|
2. O | L | C | G|
3. A | B | H | D|
4. H | B | C | P|
# reproducible
import pandas as pd
from string import ascii_uppercase as uc # just for sample data
import random # just for sample data
random.seed(365)
df = pd.DataFrame({'col1': [random.choice(uc) for _ in range(20)],
'col2': [random.choice(uc) for _ in range(20)],
'col3': [random.choice(uc) for _ in range(20)],
'col4': [random.choice(uc) for _ in range(20)]})
我正在寻找这样的功能:
func('H')
,它将返回“ H”所在的所有索引和列的名称。 有什么想法吗?
答案 0 :(得分:2)
一种解决方案是使用熔化:
# Import libraries
import numpy as np
import pandas as pd
# Create DataFrame
l = [12., 12.5, 13.1, 14.6, 17.8, 19.1, 24.5]
df = pd.DataFrame(data=l, columns=['data'])
# Initialize
N = 5 # Span
a = 2./(1+N) # Alpha
# Use .evm() to calculate 'exponential moving variance' directly
var_pandas = df.ewm(span=N).var()
# Initialize variable
varcalc=[]
# Calculate exponential moving variance
for i in range(0,len(df.data)):
# Get window
z = np.array(df.data.iloc[0:i+1].tolist())
# Get weights: w
n = len(z)
w = (1-a)**np.arange(n-1, -1, -1) # This is reverse order to match Series order
# Calculate exponential moving average
ewma = np.sum(w * z) / np.sum(w)
# Calculate bias
bias = np.sum(w)**2 / (np.sum(w)**2 - np.sum(w**2))
# Calculate exponential moving variance with bias
ewmvar = bias * np.sum(w * (z - ewma)**2) / np.sum(w)
# Calculate standard deviation
ewmstd = np.sqrt(ewmvar)
varcalc.append(ewmvar)
#print('ewmvar:',ewmvar)
#varcalc
df['var_pandas'] = var_pandas
df['varcalc'] = varcalc
df
输出为:
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
print(t[t.value == "H"])
您现在可以轻松提取列和索引。
答案 1 :(得分:2)
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))
或者,
indices = df.where(df.eq('H')).stack().index.tolist()
# print(indices)
[(3, 'col3'), (4, 'col1')]
timeit
比较所有答案:
df.shape
(50000, 4)
%%timeit -n100 @Shubham1
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))
8.87 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n100 @Scott
r,c = np.where(df == 'H')
_ = list(zip(df.index[r], df.columns[c]))
17.4 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n100 @Shubham2
indices = df.where(df.eq('H')).stack().index.tolist()
26.8 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n100 @Roy
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
_ = t[t.value == "H"]
29 ms ± 656 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
答案 2 :(得分:2)
使用np.where和索引(已更新以提高性能):
r, c = np.where(df.to_numpy() == 'H')
list(zip(df.index[r], df.columns[c]))
输出:
[(3, 'col3'), (4, 'col1')]