Question

I have one dataframe that contains a series of dates and a corresponding value. I have another dataframe with three columns (mindate, maxdate, value). I want to iterate through each row of the first dataframe, using the second dataframes mindate and maxdate. I then want to multiply every Value by the CustomerUsage in the new range df and return the sum.

>>df1        Date           Value
        0    2012-04-01     0.00275
        1    2012-04-02     0.00278
        2    2012-04-03     0.00369
        3    2012-04-04     0.00268
        4    2012-04-05     0.00400

>>df2       Start           End           CustomerUsage
        1   2012-04-01      2012-04-03    464.0
        2   2012-04-04      2012-04-04    472.1

>>    for  row in df2.iterrows():

           mindate = row[row.index[0],'Start']
           maxdate = row[row.index[0],'End']

           range = df1[(df1['Dates'] >= mindate) & (df1['Dates'] <= maxdate)]

           range['Calc'] = range['Value']*df2['CustomerUsage']
           ##numpy .agg function here##

A single row will work, but I am stuck on iterating through the dates, error of AttributeError: 'str' object has no attribute 'loc' (I gather I'm treating these tuples wrong, but unsure of the remedy folks!)

Many thanks!

Answer 1

好的，所以我最终得到了以下答案。绝对遇到过几个@Wen帖子哈哈

   #created an ID
   list_ = []
   df1.insert(0,'ID',range(0,0+len(df1)))
   for index, row in df1.iterrows():
       start = row['start']
       end = row['end']
       range = df2[(df2['date']>=start) & (df2['date']<=end)]
       df2['ID'] = row['ID']
       list_.append(df2)
   batch = pd.concat(list_)
   _small = batch.groupby(['ID']).agg({'value': np.sum})
   _merge = batch.reset_index().merge(_small.reset_index(), how = 'left', on = ['ID']

我正在尝试使用itertuples，因为我正在快速阅读它。速度至关重要，如果有人有任何升级......：）

Pandas: Iterate over dataframe rows using date filter

1 个答案: