我具有以下数据框,并且如果 id:1 大于100并且 id:2 ,我想将值设置为二进制 1 小于100 或二进制 0 ,如果 id:1 小于101 并且 id: 2 大于99 。
time id value
2012-11-01 22:00:06 1 500
2012-11-01 22:00:07 1 50
2012-11-01 22:00:08 1 0
2012-11-01 22:00:09 2 45
2012-11-01 22:00:10 2 150
2012-11-01 22:00:11 2 70
2012-11-01 22:00:12 2 20
2012-11-01 22:00:13 2 0
2012-11-01 22:00:13 3 1
2012-11-01 22:00:13 3 0
2012-11-01 22:00:13 4 1
2012-11-01 22:00:13 4 1
如果我在数据框中仅具有 id:1 和 id:2 ,则可以通过添加以下新列来实现此目的。
rindx=df[((df['id'] == 1) & (df['value'] > 100)) | ((df['id'] == 2) & (df['value'] < 100))].index
df.loc[rindx,'threshold']= 1
rindx=df[((df['id'] == 1) & (df['value'] < 101)) | ((df['id'] == 2) & (df['value'] > 99))].index
df.loc[rindx,'threshold']= 0
当我使用其他ID值不一致的ID(例如,在这种情况下, id:1 和 id:2 没有二进制值,我需要将它们转换为二进制,就像 id:3 和 id:4 。
预期输出:
time id value
2012-11-01 22:00:06 1 1
2012-11-01 22:00:07 1 0
2012-11-01 22:00:08 1 0
2012-11-01 22:00:09 2 1
2012-11-01 22:00:10 2 0
2012-06-01 22:00:11 2 1
2012-11-01 22:00:12 2 1
2012-11-01 22:00:13 2 0
2012-11-01 22:00:13 3 1
2012-11-01 22:00:13 3 0
2012-11-01 22:00:13 4 1
2012-11-01 22:00:13 4 1
答案 0 :(得分:1)
您快到了:
time id value threshold
0 2012-11-01 22:00:06 1 500 1
1 2012-11-01 22:00:07 1 50 0
2 2012-11-01 22:00:08 1 0 0
3 2012-11-01 22:00:09 2 45 1
4 2012-11-01 22:00:10 2 150 0
5 2012-11-01 22:00:11 2 70 1
6 2012-11-01 22:00:12 2 20 1
7 2012-11-01 22:00:13 2 0 1
8 2012-11-01 22:00:13 3 1 1
9 2012-11-01 22:00:13 3 0 0
10 2012-11-01 22:00:13 4 1 1
11 2012-11-01 22:00:13 4 1 1
输出:
ArrayFormula(SUMPRODUCT(('Stock In by Invoices'!F5:F=E5)*('Stock In by Invoices'!G5:G=F5)*('Stock In by Invoices'!I5:I2001)))