I have one dataframe that contains a series of dates and a corresponding value. I have another dataframe with three columns (mindate, maxdate, value). I want to iterate through each row of the first dataframe, using the second dataframes mindate and maxdate. I then want to multiply every Value by the CustomerUsage in the new range df and return the sum.
>>df1 Date Value
0 2012-04-01 0.00275
1 2012-04-02 0.00278
2 2012-04-03 0.00369
3 2012-04-04 0.00268
4 2012-04-05 0.00400
>>df2 Start End CustomerUsage
1 2012-04-01 2012-04-03 464.0
2 2012-04-04 2012-04-04 472.1
>> for row in df2.iterrows():
mindate = row[row.index[0],'Start']
maxdate = row[row.index[0],'End']
range = df1[(df1['Dates'] >= mindate) & (df1['Dates'] <= maxdate)]
range['Calc'] = range['Value']*df2['CustomerUsage']
##numpy .agg function here##
A single row will work, but I am stuck on iterating through the dates, error of AttributeError: 'str' object has no attribute 'loc' (I gather I'm treating these tuples wrong, but unsure of the remedy folks!)
Many thanks!
答案 0 :(得分:0)
好的,所以我最终得到了以下答案。绝对遇到过几个@Wen帖子哈哈
#created an ID
list_ = []
df1.insert(0,'ID',range(0,0+len(df1)))
for index, row in df1.iterrows():
start = row['start']
end = row['end']
range = df2[(df2['date']>=start) & (df2['date']<=end)]
df2['ID'] = row['ID']
list_.append(df2)
batch = pd.concat(list_)
_small = batch.groupby(['ID']).agg({'value': np.sum})
_merge = batch.reset_index().merge(_small.reset_index(), how = 'left', on = ['ID']
我正在尝试使用itertuples,因为我正在快速阅读它。速度至关重要,如果有人有任何升级......:)