我有几个数据帧。以下是每个例子。
df_min
scale code R1 R2 ...
1 121 50 30
2 121 35 45
3 121 40 50
4 121 20 30
5 121 20 35
1 313 10 7
2 313 13 10
3 313 10 12
4 313 15 8
5 313 17 10
...
df_rate
scale code R1 R2 ...
1 121 20 40
2 121 30 20
3 121 20 30
4 121 15 40
5 121 10 30
1 313 10 5
2 313 15 10
3 313 25 10
4 313 10 15
5 313 20 5
...
df_max
scale code R1 R2 ...
1 121 30 200
2 121 100 175
3 121 70 100
4 121 80 90
5 121 75 35
1 313 60 70
2 313 35 70
3 313 50 60
4 313 50 45
5 313 45 68
...
df_stock
code R1 R2 ...
121 100 150
313 70 65
....
df_new
scale code R1 R2 ...
1 121 NaN NaN
2 121 NaN NaN
3 121 NaN NaN
4 121 NaN NaN
5 121 NaN NaN
1 313 NaN NaN
2 313 NaN NaN
3 313 NaN NaN
4 313 NaN NaN
5 313 NaN NaN
...
R1
和R2
等列是城市,可能有很多。 code
列包含唯一的产品代码。他们也可以很多。 scale
标记我正在计算的周数。对于每个code
,这是五周。我还有一个空数据帧,我需要记录计算结果。
我将举一个具体计算的例子,我希望它会更清楚。在空数据框中,我选取scale
为1
且code
为121
且列为R1
的单元格。我在df_stock
中找到了相应的值。它等于100
。从中我从df_rate
列R1
中减去scale
的值,1
为df_min
。如果此值大于R1
列中scale
1
df_max
R1
scale
的值,我将其写下来,如果更少,那么我取值来自1
列的80
,其中50
为80
。结果是R1
,它超过scale
,所以我把它写下来。对于下一个单元格,我从列2
中的下一个值中减去计算值(50
),其中df_stock
为df_min
。我也检查一下。事实证明df_max
。我写下来了。等等。在表code
中,我取第一个值,从中开始减去。
事实上,code
是最小值,如果我低于它,那么我需要从scale
复制相应的值。
必须对每个唯一1
进行这些计算(每个5
在R1
范围内显示R2
至scale code R1 R2 ...
1 121 80 110
2 121 50 90
3 121 70 60
4 121 55 90
5 121 45 60
1 313 60 60
2 313 45 50
3 313 20 40
4 313 50 25
5 313 30 20
...
)以及每列例如scale
和1
。
示例的结果如下:
26
我将非常感谢任何帮助!
UPD 我写了一个可以满足我需要的脚本。这不是最佳选择,但我没有其他想法。是否可以更改它并添加一个循环?我的原始数据scale
位于import pandas as pd
import numpy as np
a = (1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
b = (121,121,121,121,121,313,313,313,313,313,444,444,444,444,444)
columns = ['scale', 'code', 'R1', 'R2', 'R3']
index = np.arange(15)
df_min = pd.DataFrame(columns=columns, index = index)
df_min['scale'] = a
df_min['code'] = b
df_min['R1'] = np.random.randint(10, 50, size=15)
df_min['R2'] = np.random.randint(10, 50, size=15)
df_min['R3'] = np.random.randint(10, 50, size=15)
df_rate = pd.DataFrame(columns=columns, index = index)
df_rate['scale'] = a
df_rate['code'] = b
df_rate['R1'] = np.random.randint(5, 40, size=15)
df_rate['R2'] = np.random.randint(5, 40, size=15)
df_rate['R3'] = np.random.randint(5, 40, size=15)
df_max = pd.DataFrame(columns=columns, index = index)
df_max['scale'] = a
df_max['code'] = b
df_max['R1'] = np.random.randint(50, 150, size=15)
df_max['R2'] = np.random.randint(50, 150, size=15)
df_max['R3'] = np.random.randint(50, 150, size=15)
index1 = np.arange(3)
df_stock = pd.DataFrame(columns=columns, index = index1)
df_stock['code'] = (121,313,444)
df_stock['R1'] = np.random.randint(100, 300, size=3)
df_stock['R2'] = np.random.randint(100, 300, size=3)
df_stock['R3'] = np.random.randint(100, 300, size=3)
df_new = pd.DataFrame(columns=columns, index = index)
df_new['scale'] = a
df_new['code'] = b
# set the index to 'code' to subtract df_rate from df_stock
df_stock = df_stock.set_index('code')
df_rate = df_rate.set_index('code')
df_new = df_stock - df_rate
# have to add back in the 'scale' column since it wasn't present in df_rate
df_new['scale'] = df_rate['scale']
# now set the index to use both 'code' and 'scale'
df_new = df_new.reset_index()
df_new = df_new.set_index(['code', 'scale'])
df_min = df_min.set_index(['code', 'scale'])
df_max = df_max.set_index(['code', 'scale'])
df_new = df_new.mask(df_new < df_min, df_max)
df_min = df_min.reset_index()
df_min.insert(2, 'test', 0)
df_max = df_max.reset_index()
df_max.insert(2, 'test', 0)
df_new = df_new.reset_index()
df_new.insert(2, 'test', 0)
df_rate = df_rate.reset_index()
df_rate.insert(2, 'test', 0)
df_new.loc[df_new['scale'].between(2,5), 'test':] = np.nan
df_rate_p = df_rate.loc[df_rate['scale'] == 2, :'scale']
df_new.index +=1
df_rate_p1 = df_new.loc[df_new['scale'] == 1, 'test':] - df_rate.loc[df_rate['scale'] == 2, 'test':]
df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1)
df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index()
df_new = df_new.mask(df_new < df_min, df_max)
df_rate_p = df_rate.loc[df_rate['scale'] == 3, :'scale']
df_new.index +=1
df_rate_p1 = df_new.loc[df_new['scale'] == 2, 'test':] - df_rate.loc[df_rate['scale'] == 3, 'test':]
df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1)
df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index()
df_new = df_new.mask(df_new < df_min, df_max)
df_rate_p = df_rate.loc[df_rate['scale'] == 4, :'scale']
df_new.index +=1
df_rate_p1 = df_new.loc[df_new['scale'] == 3, 'test':] - df_rate.loc[df_rate['scale'] == 4, 'test':]
df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1)
df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index()
df_new = df_new.mask(df_new < df_min, df_max)
df_rate_p = df_rate.loc[df_rate['scale'] == 5, :'scale']
df_new.index +=1
df_rate_p1 = df_new.loc[df_new['scale'] == 4, 'test':] - df_rate.loc[df_rate['scale'] == 5, 'test':]
df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1)
df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index()
df_new = df_new.mask(df_new < df_min, df_max)
df_new
和<Button></Button>
之间,我的代码必须为.simulate('click')
中的每个值写一个条件。
我的代码:
onClick()
答案 0 :(得分:1)
使用pandas index
和MultiIndex
对于将正确的行与彼此进行比较非常有用。
以下是您将如何使用它:
# set the index to 'code' to subtract df_rate from df_stock
df_stock = df_stock.set_index('code')
df_rate = df_rate.set_index('code')
df_new = df_stock - df_rate
# have to add back in the 'scale' column since it wasn't present in df_rate
df_new['scale'] = df_rate['scale']
# now set the index to use both 'code' and 'scale'
df_new = df_new.reset_index()
df_new = df_new.set_index(['code', 'scale'])
df_min = df_min.set_index(['code', 'scale'])
df_max = df_max.set_index(['code', 'scale'])
# you may not actually need these lines, but sometimes it is necessary!
# intersection = df_new.index.intersection(df_min.index).intersection(df_max.index)
# df_new = df_new.loc[intersection]
# df_min = df_min.loc[intersection]
# df_max = df_max.loc[intersection]
# if df_new < df_max, then use values from df_max
# I didn't actually understand what you meant to do with the
# the data, so this is probably not quite what you intended,
# but you can can use this to see how it works and implement
# your algorithm
df_new = df_new.mask(df_new < df_min, df_max)