大熊猫

时间:2017-09-13 02:54:44

标签: python pandas

我有几个数据帧。以下是每个例子。

        df_min
scale   code    R1  R2 ...  
  1      121    50  30
  2      121    35  45
  3      121    40  50
  4      121    20  30
  5      121    20  35
  1      313    10   7
  2      313    13  10
  3      313    10  12
  4      313    15  8
  5      313    17  10
...

        df_rate
scale   code    R1  R2 ...
  1      121    20  40
  2      121    30  20
  3      121    20  30
  4      121    15  40
  5      121    10  30
  1      313    10   5
  2      313    15  10
  3      313    25  10
  4      313    10  15
  5      313    20   5
...

        df_max
scale   code    R1  R2 ...
  1      121    30  200
  2      121    100 175
  3      121    70  100
  4      121    80  90
  5      121    75  35
  1      313    60  70
  2      313    35  70
  3      313    50  60
  4      313    50  45
  5      313    45  68
...

     df_stock
code    R1  R2 ...
 121    100 150
 313    70  65
....

        df_new
scale   code     R1  R2 ...
  1      121    NaN NaN
  2      121    NaN NaN
  3      121    NaN NaN
  4      121    NaN NaN
  5      121    NaN NaN
  1      313    NaN NaN
  2      313    NaN NaN
  3      313    NaN NaN
  4      313    NaN NaN
  5      313    NaN NaN
...

R1R2等列是城市,可能有很多。 code列包含唯一的产品代码。他们也可以很多。 scale标记我正在计算的周数。对于每个code,这是五周。我还有一个空数据帧,我需要记录计算结果。

我将举一个具体计算的例子,我希望它会更清楚。在空数据框中,我选取scale1code121且列为R1的单元格。我在df_stock中找到了相应的值。它等于100。从中我从df_rateR1中减去scale的值,1df_min。如果此值大于R1列中scale 1 df_max R1 scale的值,我将其写下来,如果更少,那么我取值来自1列的80,其中5080。结果是R1,它超过scale,所以我把它写下来。对于下一个单元格,我从列2中的下一个值中减去计算值(50),其中df_stockdf_min。我也检查一下。事实证明df_max。我写下来了。等等。在表code中,我取第一个值,从中开始减去。

事实上,code是最小值,如果我低于它,那么我需要从scale复制相应的值。

必须对每个唯一1进行这些计算(每个5R1范围内显示R2scale code R1 R2 ... 1 121 80 110 2 121 50 90 3 121 70 60 4 121 55 90 5 121 45 60 1 313 60 60 2 313 45 50 3 313 20 40 4 313 50 25 5 313 30 20 ... )以及每列例如scale1

示例的结果如下:

26

我将非常感谢任何帮助!

UPD 我写了一个可以满足我需要的脚本。这不是最佳选择,但我没有其他想法。是否可以更改它并添加一个循环?我的原始数据scale位于import pandas as pd import numpy as np a = (1,2,3,4,5,1,2,3,4,5,1,2,3,4,5) b = (121,121,121,121,121,313,313,313,313,313,444,444,444,444,444) columns = ['scale', 'code', 'R1', 'R2', 'R3'] index = np.arange(15) df_min = pd.DataFrame(columns=columns, index = index) df_min['scale'] = a df_min['code'] = b df_min['R1'] = np.random.randint(10, 50, size=15) df_min['R2'] = np.random.randint(10, 50, size=15) df_min['R3'] = np.random.randint(10, 50, size=15) df_rate = pd.DataFrame(columns=columns, index = index) df_rate['scale'] = a df_rate['code'] = b df_rate['R1'] = np.random.randint(5, 40, size=15) df_rate['R2'] = np.random.randint(5, 40, size=15) df_rate['R3'] = np.random.randint(5, 40, size=15) df_max = pd.DataFrame(columns=columns, index = index) df_max['scale'] = a df_max['code'] = b df_max['R1'] = np.random.randint(50, 150, size=15) df_max['R2'] = np.random.randint(50, 150, size=15) df_max['R3'] = np.random.randint(50, 150, size=15) index1 = np.arange(3) df_stock = pd.DataFrame(columns=columns, index = index1) df_stock['code'] = (121,313,444) df_stock['R1'] = np.random.randint(100, 300, size=3) df_stock['R2'] = np.random.randint(100, 300, size=3) df_stock['R3'] = np.random.randint(100, 300, size=3) df_new = pd.DataFrame(columns=columns, index = index) df_new['scale'] = a df_new['code'] = b # set the index to 'code' to subtract df_rate from df_stock df_stock = df_stock.set_index('code') df_rate = df_rate.set_index('code') df_new = df_stock - df_rate # have to add back in the 'scale' column since it wasn't present in df_rate df_new['scale'] = df_rate['scale'] # now set the index to use both 'code' and 'scale' df_new = df_new.reset_index() df_new = df_new.set_index(['code', 'scale']) df_min = df_min.set_index(['code', 'scale']) df_max = df_max.set_index(['code', 'scale']) df_new = df_new.mask(df_new < df_min, df_max) df_min = df_min.reset_index() df_min.insert(2, 'test', 0) df_max = df_max.reset_index() df_max.insert(2, 'test', 0) df_new = df_new.reset_index() df_new.insert(2, 'test', 0) df_rate = df_rate.reset_index() df_rate.insert(2, 'test', 0) df_new.loc[df_new['scale'].between(2,5), 'test':] = np.nan df_rate_p = df_rate.loc[df_rate['scale'] == 2, :'scale'] df_new.index +=1 df_rate_p1 = df_new.loc[df_new['scale'] == 1, 'test':] - df_rate.loc[df_rate['scale'] == 2, 'test':] df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1) df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index() df_new = df_new.mask(df_new < df_min, df_max) df_rate_p = df_rate.loc[df_rate['scale'] == 3, :'scale'] df_new.index +=1 df_rate_p1 = df_new.loc[df_new['scale'] == 2, 'test':] - df_rate.loc[df_rate['scale'] == 3, 'test':] df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1) df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index() df_new = df_new.mask(df_new < df_min, df_max) df_rate_p = df_rate.loc[df_rate['scale'] == 4, :'scale'] df_new.index +=1 df_rate_p1 = df_new.loc[df_new['scale'] == 3, 'test':] - df_rate.loc[df_rate['scale'] == 4, 'test':] df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1) df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index() df_new = df_new.mask(df_new < df_min, df_max) df_rate_p = df_rate.loc[df_rate['scale'] == 5, :'scale'] df_new.index +=1 df_rate_p1 = df_new.loc[df_new['scale'] == 4, 'test':] - df_rate.loc[df_rate['scale'] == 5, 'test':] df_new2 = pd.concat([df_rate_p, df_rate_p1], axis=1) df_new = df_new.set_index(['code', 'scale']).fillna(df_new2.set_index(['code', 'scale'])).reset_index() df_new = df_new.mask(df_new < df_min, df_max) df_new <Button></Button>之间,我的代码必须为.simulate('click')中的每个值写一个条件。

我的代码:

onClick()

1 个答案:

答案 0 :(得分:1)

使用pandas indexMultiIndex对于将正确的行与彼此进行比较非常有用。

以下是您将如何使用它:

# set the index to 'code' to subtract df_rate from df_stock
df_stock = df_stock.set_index('code')
df_rate = df_rate.set_index('code')
df_new = df_stock - df_rate
# have to add back in the 'scale' column since it wasn't present in df_rate
df_new['scale'] = df_rate['scale']

# now set the index to use both 'code' and 'scale'
df_new = df_new.reset_index()
df_new = df_new.set_index(['code', 'scale'])
df_min = df_min.set_index(['code', 'scale'])
df_max = df_max.set_index(['code', 'scale'])

# you may not actually need these lines, but sometimes it is necessary!
# intersection = df_new.index.intersection(df_min.index).intersection(df_max.index)
# df_new = df_new.loc[intersection]
# df_min = df_min.loc[intersection]
# df_max = df_max.loc[intersection]

# if df_new < df_max, then use values from df_max
# I didn't actually understand what you meant to do with the
# the data, so this is probably not quite what you intended,
# but you can can use this to see how it works and implement
# your algorithm
df_new = df_new.mask(df_new < df_min, df_max)