在此回答了该问题的早期版本:
How to vectorize comparison in pandas dataframe?
现在,我用Machine
添加了一个新条件:
+---------+-----+-------+---------+
| Machine | nr | Time | Event |
+---------+-----+-------+---------+
| a | 70 | 8 | 1 |
| a | 70 | 0 | 1 |
| b | 70 | 0 | 1 |
| c | 74 | 52 | 1 |
| c | 74 | 12 | 2 |
| c | 74 | 0 | 2 |
+---------+-----+-------+---------+
我想将事件分配到最后一列。每个Machine
的第一项默认为1。也就是说,如果它是新的Machine
,则Event
从1重新开始。
If Time[i] < 7 and nr[i] != nr[i-1], then Event[i]=Event[i-1]+1.
If Time[i] < 7 and nr[i] = nr[i-1], then Event[i]=Event[i-1]
If Time[i] > 7 then Event[i]=Event[i-1]+1.
如何有效地向量化?我想避免循环。 我尝试使用
扩展现有解决方案m = df.Machine.ne(df.Machine.shift())
o = np.select([t & n, t & ~n, m], [1, 0, 1], 1)
但是我知道,这不会将新的Event
的{{1}}重置为1,只会增加它。关于如何整合这一点的任何指示?
答案 0 :(得分:1)
根据先前的解决方案进行开发。在您的样本上看起来是正确的:
t = df.Time.lt(7)
n = df.nr.ne(df.nr.shift())
m = df.Machine.ne(df.Machine.shift())
df['Event'] = np.select([m | t & n, t & ~n], [1, 0], 1)
df['Event'] = df.groupby('Machine').Event.cumsum()
Out[279]:
Machine nr Time Event
0 a 70 8 1
1 a 70 0 1
2 b 70 0 1
3 c 74 52 1
4 c 74 12 2
5 c 74 0 2
答案 1 :(得分:0)
以下应该产生您想要的输出:
# Given you have a dataframe as df
# Create a series for grouping and looking for consecutive runs
mach_nr = df["Machine"] + df["nr"].astype("str")
mach_nr_runs = mach_nr.eq(mach_nr.shift())
# Groupby consecutive runs of each 'Machine'/'nr' combination by its
# that combination value, and take the cumulative sum of the equality
# of shifted combinations
df["Event"] = (
mach_nr_runs.groupby(mach_nr)
.cumsum()
.astype("int")
.add(1)
)
# Correct the rows where there were consecutive runs, and where 'Time' < 7
lt_7_runs = (df["Time"] < 7) & mach_nr_runs
df["Event"] -= (
lt_7_runs.groupby(mach_nr)
.cumsum()
.astype("int")
)
df
现在如下所示:
Machine nr Time Event
0 a 70 8 1
1 a 70 0 1
2 b 70 0 1
3 c 74 52 1
4 c 74 12 2
5 c 74 0 2
答案 2 :(得分:0)
根据您先前的问题(及其出色的答案),您可以执行groupby('machine')
并应用该函数,就好像只有一个数据框一样。
def get_event(x):
t = x.Time.lt(7)
n = x.nr.ne(x.nr.shift())
o = np.select([t & n, t & ~n], [1, 0], 1)
o[0] = 1 # You say first value is 1
return pd.Series(o.cumsum(), index=x.index)
df['Event'] = df.groupby('Machine', group_keys=False).apply(get_event)