我有数据框。这是他们的一部分
int[] array = {1,3,5,2,9,7,0};
Arrays.sort(array);
欲望输出
member_id event_time event_path event_duration \
0 2333678 2016-12-27 04:17:16 youtube.com/watch?v=w5ZIb05NO58 12
1 2333678 2016-12-27 04:17:26 youtube.com/watch?v=w5ZIb05NO58 12
2 2333678 2016-12-27 04:17:36 youtube.com/watch?v=w5ZIb05NO58 10
3 2333678 2016-12-27 04:17:40 youtube.com/watch?v=w5ZIb05NO58 35
4 5611206 2016-12-30 17:16:01 youtube.com/watch?v=qZrQWA5IsKA 35
5 5611206 2016-12-30 17:16:10 youtube.com/watch?v=qZrQWA5IsKA 12
6 5611206 2016-12-30 17:16:27 youtube.com/watch?v=6YM5UhnElcE 10
7 5611206 2016-12-30 17:16:37 youtube.com/watch?v=6YM5UhnElcE 10
8 5611206 2016-12-30 17:16:47 youtube.com/watch?v=6YM5UhnElcE 10
我用
member_id event_time event_path event_duration
0 2333678 2016-12-27 04:17:16 youtube.com/watch?v=w5ZIb05NO58 69
4 5611206 2016-12-30 17:16:01 youtube.com/watch?v=qZrQWA5IsKA 47
6 5611206 2016-12-30 17:16:27 youtube.com/watch?v=6YM5UhnElcE 30
但它不会连接所有字符串。
答案 0 :(得分:1)
如果您想为event_time
中的每个群组设置第一个项目,您可以使用以下内容(您还将其用于event_path
):
>>> df.groupby([df.member_id, df.event_path]).agg({'event_duration':'sum', 'event_time': 'first'}).reset_index().reindex(columns=df.columns)
member_id event_time event_path event_duration
0 2016-12-27 04:17:16 youtube.com/watch?v=w5ZIb05NO58 69
1 2016-12-30 17:16:27 youtube.com/watch?v=6YM5UhnElcE 30
2 2016-12-30 17:16:01 youtube.com/watch?v=qZrQWA5IsKA 47
答案 1 :(得分:1)
df.groupby(['member_id','event_path']).agg({'event_time':'min','event_duration':'sum'}).reset_index()
输出:
member_id event_path event_time \
0 2333678 youtube.com/watch?v=w5ZIb05NO58 2016-12-27 04:17:16
1 5611206 youtube.com/watch?v=6YM5UhnElcE 2016-12-30 17:16:27
2 5611206 youtube.com/watch?v=qZrQWA5IsKA 2016-12-30 17:16:01
event_duration
0 69
1 30
2 47