Question

我有一个数据框，我必须在其中执行一些操作。我一切都很好，像这样：

 ID  Value      Date          Date_diff_cumsum     Val     Weight
  1   0.000000 2017-02-13 20:54:00     0.0       0.000000     nan
  1   0.029598 2017-02-13 21:02:00     8.0       0.029598     nan
  1   0.273000 2017-02-13 22:33:00    99.0       0.273000     nan
  1   0.153000 2017-02-13 23:24:00    150.0      0.15300      nan

我还有另一个具有权重的数据集，如下所示：

ID   Value
 1   78.0
 2   75.0
 3   83.0
 4   60.0

我想用每个ID的权重重复来填充原始数据框的weigth列，例如：

 ID  Value      Date          Date_diff_cumsum   Val        Weight
  1   0.000000 2017-02-13 20:54:00     0.0       0.000000     78.0
  1   0.029598 2017-02-13 21:02:00     8.0       0.029598     78.0
  1   0.273000 2017-02-13 22:33:00    99.0       0.273000     78.0
  1   0.153000 2017-02-13 23:24:00    150.0      0.15300      78.0
  ...    ...          ...              ...          ...         ...
  4   ....      .....      ....        ....        ...         60.0
  4   ....      .....      ....        ....        ...         60.0

那是因为我需要使用以下公式进行计算：

对于每个ID，（Val * 1000）/（weight * Date_diff_cumsum），即：将每个Val乘以1000，然后除以权重再乘以i与i-1时间范围之间的时间差（Date_diff_cumsum ）并将其存储在数据框中，在其中可以绘制res

那是我的代码：

df = df[['ID','Value', 'Date']]
df = df.sort_values(by=['Date'])
df['Date_diff_cumsum'] = df.groupby('ID').Date.diff().dt.seconds / 60.0
df['Date_diff_cumsum'] = 
df.groupby('ID').Date_diff_cumsum.cumsum().fillna(0)
df['TempVal'] = df.groupby('ID')['Value'].transform(lambda x:(x- 
x.iloc[0]*1000))

我如何执行将第二个数据帧中的weigth重复项添加到第一个数据帧中的操作？有没有更有效的方法？因为我需要以相同的方式来计算最终结果，但是要为每个ID使用其他名称不同但值相似的3个其他数据框，例如：

score = df1[(Val*1000)/(weight*Date_diff_cumsum)]+ 
df2(Val*1000)/(weight*Date_diff_cumsum)]+...

非常感谢您

编辑：现在它正在工作，但是每当我尝试找到最终的数据框时：

score = df1.TempVal + df2.TempVal + df3.TempVal

我得到一个装满nans的空数据框。你知道为什么吗？我需要为每个ID打印所有的tempVal并进行绘制

Answer 1

只需将权重映射为：

df["Weight"] = df["ID"].map(weights["Value"])

weights是您的其他数据集（并且还需要将ID设置为该数据集的索引）。

Answer 2

您可以使用map将值从df2映射到Weight。由于您已经通过按ID分组计算了date_diff_cumsum，因此您可以直接从df1计算tempval，

df1['Weight'] = df1['ID'].map(df2.set_index('ID')['Value'])

df1['TempVal'] = df1['Value']*1000/(df1['Weight'] * df1['Date_diff_cumsum'])

    ID  Value       Date              Date_diff_cumsum  Val       Weight    TempVal
0   1   0.000000    2017-02-13 20:54:00 0.0             0.000000    78.0    NaN
1   1   0.029598    2017-02-13 21:02:00 8.0             0.029598    78.0    0.047433
2   1   0.273000    2017-02-13 22:33:00 99.0            0.273000    78.0    0.035354
3   1   0.153000    2017-02-13 23:24:00 150.0           0.153000    78.0    0.013077

用其他数据框中的值填充列，并在熊猫中添加相应的ID

2 个答案: