给出这样的数据集:
let arr = [ 'test', '1994', 'test', 'test', '2018'],
isYear = (s) => !isNaN(Number(s.trim())),
result = arr.reduce((a, s) => {
if (isYear(s)) a.years.push(s);
else a.strings.push(s);
return a;
} , {years: [], strings: []});
console.log(result);
我希望每组连续的road_type的最大和为.as-console-wrapper { max-height: 100% !important; top: 0; }
;假设在此示例中values = ([ 'motorway' ] * 5) + ([ 'link' ] * 3) + ([ 'motorway' ] * 7)
df = pd.DataFrame.from_dict({
'timestamp': pd.date_range(start='2018-1-1', end='2018-1-2', freq='s').tolist()[:len(values)],
'road_type': values,
})
df.set_index('timestamp')
df['delta_t'] = (df['timestamp'] - df['timestamp'].shift()).fillna(0)
将是delta_t
,我想找到delta_t
:1s
和motorway
:7s
。实际上,将会有更多的road_type,而link
会有所不同。
编辑:here提供的解决方案看起来很相似,但是它不求和,也不选择每个组中的最大组。
答案 0 :(得分:0)
创建一个新列,用唯一的整数标记相同道路类型的每个“运行”,然后按该列进行分组并求和:
df['run'] = (df['road_type'] != df['road_type'].shift()).astype(int).cumsum()
df
timestamp road_type delta_t run
0 2018-01-01 00:00:00 motorway 00:00:00 1
1 2018-01-01 00:00:01 motorway 00:00:01 1
2 2018-01-01 00:00:02 motorway 00:00:01 1
3 2018-01-01 00:00:03 motorway 00:00:01 1
4 2018-01-01 00:00:04 motorway 00:00:01 1
5 2018-01-01 00:00:05 link 00:00:01 2
6 2018-01-01 00:00:06 link 00:00:01 2
7 2018-01-01 00:00:07 link 00:00:01 2
8 2018-01-01 00:00:08 motorway 00:00:01 3
9 2018-01-01 00:00:09 motorway 00:00:01 3
10 2018-01-01 00:00:10 motorway 00:00:01 3
11 2018-01-01 00:00:11 motorway 00:00:01 3
12 2018-01-01 00:00:12 motorway 00:00:01 3
13 2018-01-01 00:00:13 motorway 00:00:01 3
14 2018-01-01 00:00:14 motorway 00:00:01 3
df.groupby('run').agg({'road_type': 'first', 'delta_t': 'sum'}).reset_index(drop=True).groupby('road_type').max()
delta_t
road_type
link 00:00:03
motorway 00:00:07