我有一个如下数据框。
test = pd.DataFrame({'col1':[0,0,1,0,0,0,1,2,0], 'col2': [0,0,1,2,3,0,0,0,0]})
col1 col2
0 0 0
1 0 0
2 1 1
3 0 2
4 0 3
5 0 0
6 1 0
7 2 0
8 0 0
对于每一列,我想在每一列的最大值之前找到值1的索引。例如,对于第一列,最大值为2,在2之前的值1的索引为6。对于第二列,最大值为3,在3之前的值1的索引为2。
总而言之,我希望获得[6,2]作为此测试DataFrame的输出。有没有一种快速的方法来实现这一目标?
答案 0 :(得分:5)
使用Series.mask
隐藏不为1的元素,然后将Series.last_valid_index
应用于每一列。
m = test.eq(test.max()).cumsum().gt(0) | test.ne(1)
test.mask(m).apply(pd.Series.last_valid_index)
col1 6
col2 2
dtype: int64
使用numpy进行矢量化,可以使用numpy.cumsum
和argmax
:
idx = ((test.eq(1) & test.eq(test.max()).cumsum().eq(0))
.values
.cumsum(axis=0)
.argmax(axis=0))
idx
# array([6, 2])
pd.Series(idx, index=[*test])
col1 6
col2 2
dtype: int64
答案 1 :(得分:4)
使用last_valid_index
的@ cs95想法:
test.apply(lambda x: x[:x.idxmax()].eq(1)[lambda i:i].last_valid_index())
输出:
col1 6
col2 2
dtype: int64
解释:
使用索引切片将每一列切割为最大值,然后查找等于1的值并找到最后一个真值的索引。
或者按照@QuangHoang的建议:
test.apply(lambda x: x[:x.idxmax()].eq(1).cumsum().idxmax())
答案 2 :(得分:4)
t = test.to_numpy()
a = t.argmax(0)
i, j = np.where(t == 1)
mask = i <= a[j]
i = i[mask]
j = j[mask]
b = np.empty_like(a)
b.fill(-1)
np.maximum.at(b, j, i)
pd.Series(b, test.columns)
col1 6
col2 2
dtype: int64
apply
test.apply(lambda s: max(s.index, key=lambda x: (s[x] == 1, s[x] <= s.max(), x)))
col1 6
col2 2
dtype: int64
cummax
test.eq(1).where(test.cummax().lt(test.max())).iloc[::-1].idxmax()
col1 6
col2 2
dtype: int64
我只是想使用一个新工具并做一些标记 see this post
r.to_pandas_dataframe().T
10 31 100 316 1000 3162 10000
al_0 0.003696 0.003718 0.005512 0.006210 0.010973 0.007764 0.012008
wb_0 0.003348 0.003334 0.003913 0.003935 0.004583 0.004757 0.006096
qh_0 0.002279 0.002265 0.002571 0.002643 0.002927 0.003070 0.003987
sb_0 0.002235 0.002246 0.003072 0.003357 0.004136 0.004083 0.005286
sb_1 0.001771 0.001779 0.002331 0.002353 0.002914 0.002936 0.003619
cs_0 0.005742 0.005751 0.006748 0.006808 0.007845 0.008088 0.009898
cs_1 0.004034 0.004045 0.004871 0.004898 0.005769 0.005997 0.007338
pr_0 0.002484 0.006142 0.027101 0.085944 0.374629 1.292556 6.220875
pr_1 0.003388 0.003414 0.003981 0.004027 0.004658 0.004929 0.006390
pr_2 0.000087 0.000088 0.000089 0.000093 0.000107 0.000145 0.000300
fig = plt.figure(figsize=(10, 10))
ax = plt.subplot()
r.plot(ax=ax)
from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()
def al_0(test): return test.apply(lambda x: x.where(x[:x.idxmax()].eq(1)).drop_duplicates(keep='last').idxmin())
def wb_0(df): return (df.iloc[::-1].cummax().eq(df.max())&df.eq(1).iloc[::-1]).idxmax()
def qh_0(test): return (test.eq(1) & (test.index.values[:,None] < test.idxmax().values)).cumsum().idxmax()
def sb_0(test): return test.apply(lambda x: x[:x.idxmax()].eq(1)[lambda i:i].last_valid_index())
def sb_1(test): return test.apply(lambda x: x[:x.idxmax()].eq(1).cumsum().idxmax())
def cs_0(test): return (lambda m: test.mask(m).apply(pd.Series.last_valid_index))(test.eq(test.max()).cumsum().gt(0) | test.ne(1))
def cs_1(test): return pd.Series((test.eq(1) & test.eq(test.max()).cumsum().eq(0)).values.cumsum(axis=0).argmax(axis=0), test.columns)
def pr_0(test): return test.apply(lambda s: max(s.index, key=lambda x: (s[x] == 1, s[x] <= s.max(), x)))
def pr_1(test): return test.eq(1).where(test.cummax().lt(test.max())).iloc[::-1].idxmax()
def pr_2(test):
t = test.to_numpy()
a = t.argmax(0)
i, j = np.where(t == 1)
mask = i <= a[j]
i = i[mask]
j = j[mask]
b = np.empty_like(a)
b.fill(-1)
np.maximum.at(b, j, i)
return pd.Series(b, test.columns)
import math
def gen_test(n):
a = np.random.randint(100, size=(n, int(math.log10(n)) + 1))
idx = a.argmax(0)
while (idx == 0).any():
a = np.random.randint(100, size=(n, int(math.log10(n)) + 1))
idx = a.argmax(0)
for j, i in enumerate(idx):
a[np.random.randint(i), j] = 1
return pd.DataFrame(a).add_prefix('col')
@b.add_arguments('DataFrame Size')
def argument_provider():
for exponent in np.linspace(1, 3, 5):
size = int(10 ** exponent)
yield size, gen_test(size)
b.add_functions([al_0, wb_0, qh_0, sb_0, sb_1, cs_0, cs_1, pr_0, pr_1, pr_2])
r = b.run()
答案 3 :(得分:3)
这里有点逻辑
(df.iloc[::-1].cummax().eq(df.max())&df.eq(1).iloc[::-1]).idxmax()
Out[187]:
col1 6
col2 2
dtype: int64
答案 4 :(得分:2)
这是numpy
和pandas
混合的解决方案:
(test.eq(1) & (test.index.values[:,None] < test.idxmax().values)).cumsum().idxmax()
这比其他解决方案要快。
答案 5 :(得分:1)
我将if (e.Message.Type == Telegram.Bot.Types.Enums.MessageType.Text && e.Message.Text == "/start")
{
var rmu = new ReplyKeyboardMarkup();
rmu.Keyboard = new KeyboardButton[][]
{
new KeyboardButton[]
{
new KeyboardButton("\U0001F525 Yes,I Do!"),
new KeyboardButton("\U0001F61E No,I want to Register!")
},
};
rmu.ResizeKeyboard = true;
rmu.OneTimeKeyboard = true;
var message = string.Format("\U0001F44B Hello {0} , welcome to our system. Are you registered before?", e.Message.From.FirstName);
Bot.SendTextMessageAsync(e.Message.Chat.Id, message, Telegram.Bot.Types.Enums.ParseMode.Default, false, false, 0, rmu, System.Threading.CancellationToken.None);
}
if (e.Message.Type == Telegram.Bot.Types.Enums.MessageType.Text)
{
Console.WriteLine(e.Message.From.Username);
Console.WriteLine(e.Message.Text);
if(e.Message.Text.Contains("Yes,I Do!"))
{
var rmu = new ReplyKeyboardMarkup();
rmu.Keyboard = new KeyboardButton[][]
{
new KeyboardButton[]
{
new KeyboardButton("\U0001F512 Forgot username or password"),
},
};
rmu.ResizeKeyboard = true;
Bot.SendTextMessageAsync(e.Message.Chat.Id, "Please enter your username in our system.", Telegram.Bot.Types.Enums.ParseMode.Default, false, false, 0, rmu, System.Threading.CancellationToken.None);
}
}
与dropna
一起使用,以删除重复的where
,并保留最后一个1
,并在其上调用1
。
idxmin