我正在尝试根据条件查找数据帧中四个连续行的累积总和。
新列('veh_time_TOT')是四个连续的' veh_time(s)'值的总和,条件为' Day_type >:周末或工作日。
现在是如何设置数据:
veh-time(s) distance(m) Day_type
0 72 379.0 Weekday
1 70 379.0 Weekday
2 50 379.0 Weekday
3 60 379.0 Weekday
4 70 379.0 Weekday
5 65 379.0 Weekday
6 30 379.0 Weekend
7 35 379.0 Weekend
8 30 379.0 Weekend
9 30 379.0 Weekend
10 20 379.0 Weekend
这是所需的输出:
veh-time(s) distance(m) Day_type veh_time_TOT
0 72 379.0 Weekday 0
1 70 379.0 Weekday 0
2 50 379.0 Weekday 0
3 60 379.0 Weekday 252
4 70 379.0 Weekday 250
5 65 379.0 Weekday 245
6 30 379.0 Weekend 0
7 35 379.0 Weekend 0
8 30 379.0 Weekend 0
9 30 379.0 Weekend 125
10 20 379.0 Weekend 115
我已经尝试了几件事,但是我唯一能找到的就是使用.cumsum函数,该函数只能找到2个连续行的总和。 “ veh_time_TOT ”中的零是存在的,因为还没有4行可以组成总和。
我认为这将是.cumsum和条件if语句循环出现的组合。
你们怎么看?任何帮助表示赞赏。
答案 0 :(得分:0)
以下是我获取所需列的步骤:
首先,我设置您的示例DataFrame。
接下来,我定义了三个感兴趣的列(该列的 值将作为计算的基础,该列用于 比较,以及计算出的数量的列名。
col_compare
值都相同)。然后我遍历原始DataFrame的这一片,将col_val
的前四个值相加。
最后,我用所需名称col_name_new
这是我的代码,请随时在评论中问问Q!
import pandas as pd
# Setup
cols = ['veh-time(s)', 'distance(m)', 'Day_type']
data= [[72, 379.0 , 'Weekday'],
[70, 379.0 , 'Weekday'],
[50, 379.0 , 'Weekday'],
[60, 379.0 , 'Weekday'],
[70, 379.0 , 'Weekday'],
[65, 379.0 , 'Weekday'],
[30, 379.0 , 'Weekend'],
[35, 379.0 , 'Weekend'],
[30, 379.0 , 'Weekend'],
[30, 379.0 , 'Weekend'],
[20, 379.0 , 'Weekend']]
df = pd.DataFrame(data,columns=cols )
# Define columns for potential future generalization
col_val='veh-time(s)'
col_compare='Day_type'
col_name_new = 'veh_time_TOT'
# DataFrame slice of rows eligible for calculation
cut_prev_four = (df[col_compare].shift(1)==df[col_compare]) \
&(df[col_compare].shift(2)==df[col_compare].shift(1)) \
&(df[col_compare].shift(3)==df[col_compare].shift(2))
df_consecutive = df[cut_prev_four]
# Perform calculation on eligible rows. Store in list
prev_four_list = []
for i,row in df_consecutive.iterrows():
prev_four_vals = df.iloc[i-3:i+1][col_val].values
print(i, prev_four_vals, sum(prev_four_vals) )
prev_four_list.append(sum(prev_four_vals))
# Set new column to the calculated values
df[col_name_new] = 0
df.loc[cut_prev_four, col_name_new] = prev_four_list