我正在寻找一个基于Python的解决方案,该解决方案将对ID进行分组并找到不同时间范围内数据的平均值。
Input Data
Id Time X1 Y1 X2 Y2 X3 Y3
A 0.08 427 351 427 351 427 353
A 0.15 384 365 384 365 384 367
A 0.24 125 190 196 404 196 406
A 0.39 468 342 468 342 398 375
A 0.47 171 457 171 457 171 460
A 0.53 1 343 1 343 1 345
A 0.66 139 328 139 328 139 330
B 0.04 152 179 152 181 150 183
B 0.19 74 75 123 400 123 404
B 0.26 117 99 117 104 116 105
B 0.39 156 125 156 131 71 209
B 0.47 187 147 189 155 187 157
B 0.03 272 340 278 361 249 442
B 0.14 272 351 275 354 250 420
C 0.26 279 347 279 347 266 384
C 0.37 271 337 283 348 258 377
在ID上分组,并根据框架确定范围内X1,Y1,Y2,Y2,X3,Y3的均值。
将为以下范围内的帧计算分组ID的所有X,y值的平均值。如果在该范围内没有x,y值,则返回NaN
1 = (Time <= .1)
2 = (.1 <= Time <= .2)
3 = (.2 <= Time <= .3)
4 = (.3 <= Time <= .4)
5 = (.4 <= Time <= .5)
6 = (.5 <= Time <= .6)
7 = (.6 <= Time <= .7)
8 = (.7 <= Time <= .8)
9 = (.8 <= Time <= .9)
Id 1X1 1Y1 1X2 1Y2 1X3 1Y3 ... 9X3 9Y3
A 427 351 427 351 427 353
A 384 365 384 365 384 367
A 125 190 196 404 196 406
A 468 342 468 342 398 375
A 171 457 171 457 171 460
A 1 343 1 343 1 345
A 139 328 139 328 139 330
B 152 179 152 181 150 183
B 74 75 123 400 123 404
B 117 99 117 104 116 105
B 156 125 156 131 71 209
B 187 147 189 155 187 157
B 272 340 278 361 249 442
B 272 351 275 354 250 420
C 279 347 279 347 266 384
C 271 337 283 348 258 377
答案 0 :(得分:0)
我认为您的预期输出有误解。您似乎看到的数字表明您正在沿行旋转“时间轴”,如以下步骤所示。但是,与此同时,列名表明您还沿着列沿X,Y变量中的每一个对bin维度进行了枢转-尽管您没有提供这些数字。
以下是导致输出的时间段位于行中的步骤。
import pandas as pd
import numpy as np
>>>df
Id Time X1 Y1 X2 Y2 X3 Y3
0 A 0.08 427 351 427 351 427 353
1 A 0.15 384 365 384 365 384 367
2 A 0.24 125 190 196 404 196 406
3 A 0.39 468 342 468 342 398 375
4 A 0.47 171 457 171 457 171 460
5 A 0.53 1 343 1 343 1 345
6 A 0.66 139 328 139 328 139 330
7 B 0.04 152 179 152 181 150 183
8 B 0.19 74 75 123 400 123 404
9 B 0.26 117 99 117 104 116 105
10 B 0.39 156 125 156 131 71 209
11 B 0.47 187 147 189 155 187 157
12 B 0.03 272 340 278 361 249 442
13 B 0.14 272 351 275 354 250 420
14 C 0.26 279 347 279 347 266 384
15 C 0.37 271 337 283 348 258 377
# This is the base operation that you're looking for to produce the output in your example
df = df.groupby(['Id', pd.cut(df['Time'], np.arange(0, 1.0, 0.1))]).mean()
>>>df
Time X1 Y1 X2 Y2 X3 Y3
Id Time
A (0.0, 0.1] 0.080 427.0 351.0 427.0 351.0 427.0 353.0
(0.1, 0.2] 0.150 384.0 365.0 384.0 365.0 384.0 367.0
(0.2, 0.3] 0.240 125.0 190.0 196.0 404.0 196.0 406.0
(0.3, 0.4] 0.390 468.0 342.0 468.0 342.0 398.0 375.0
(0.4, 0.5] 0.470 171.0 457.0 171.0 457.0 171.0 460.0
(0.5, 0.6] 0.530 1.0 343.0 1.0 343.0 1.0 345.0
(0.6, 0.7] 0.660 139.0 328.0 139.0 328.0 139.0 330.0
(0.7, 0.8] NaN NaN NaN NaN NaN NaN NaN
(0.8, 0.9] NaN NaN NaN NaN NaN NaN NaN
B (0.0, 0.1] 0.035 212.0 259.5 215.0 271.0 199.5 312.5
(0.1, 0.2] 0.165 173.0 213.0 199.0 377.0 186.5 412.0
(0.2, 0.3] 0.260 117.0 99.0 117.0 104.0 116.0 105.0
(0.3, 0.4] 0.390 156.0 125.0 156.0 131.0 71.0 209.0
(0.4, 0.5] 0.470 187.0 147.0 189.0 155.0 187.0 157.0
(0.5, 0.6] NaN NaN NaN NaN NaN NaN NaN
(0.6, 0.7] NaN NaN NaN NaN NaN NaN NaN
(0.7, 0.8] NaN NaN NaN NaN NaN NaN NaN
(0.8, 0.9] NaN NaN NaN NaN NaN NaN NaN
C (0.0, 0.1] NaN NaN NaN NaN NaN NaN NaN
(0.1, 0.2] NaN NaN NaN NaN NaN NaN NaN
(0.2, 0.3] 0.260 279.0 347.0 279.0 347.0 266.0 384.0
(0.3, 0.4] 0.370 271.0 337.0 283.0 348.0 258.0 377.0
(0.4, 0.5] NaN NaN NaN NaN NaN NaN NaN
(0.5, 0.6] NaN NaN NaN NaN NaN NaN NaN
(0.6, 0.7] NaN NaN NaN NaN NaN NaN NaN
(0.7, 0.8] NaN NaN NaN NaN NaN NaN NaN
(0.8, 0.9] NaN NaN NaN NaN NaN NaN NaN
"""
The rest are just cosmetics
"""
# Drop the original Time column
df.drop('Time', axis=1, inplace=True)
# Reset the index
df.reset_index(inplace=True)
# Add a numerical label for the Time bins
df['TimeNo'] = (df.index % 9) + 1
# Rearrange the columns
df = df.iloc[:,[0,1,8]].join(df.iloc[:,2:8])
# Drop the NaN rows
df = df.loc[np.sum(df.iloc[:,3:], axis=1)>0]
>>>df
Id Time TimeNo X1 Y1 X2 Y2 X3 Y3
0 A (0.0, 0.1] 1 427.0 351.0 427.0 351.0 427.0 353.0
1 A (0.1, 0.2] 2 384.0 365.0 384.0 365.0 384.0 367.0
2 A (0.2, 0.3] 3 125.0 190.0 196.0 404.0 196.0 406.0
3 A (0.3, 0.4] 4 468.0 342.0 468.0 342.0 398.0 375.0
4 A (0.4, 0.5] 5 171.0 457.0 171.0 457.0 171.0 460.0
5 A (0.5, 0.6] 6 1.0 343.0 1.0 343.0 1.0 345.0
6 A (0.6, 0.7] 7 139.0 328.0 139.0 328.0 139.0 330.0
9 B (0.0, 0.1] 1 212.0 259.5 215.0 271.0 199.5 312.5
10 B (0.1, 0.2] 2 173.0 213.0 199.0 377.0 186.5 412.0
11 B (0.2, 0.3] 3 117.0 99.0 117.0 104.0 116.0 105.0
12 B (0.3, 0.4] 4 156.0 125.0 156.0 131.0 71.0 209.0
13 B (0.4, 0.5] 5 187.0 147.0 189.0 155.0 187.0 157.0
20 C (0.2, 0.3] 3 279.0 347.0 279.0 347.0 266.0 384.0
21 C (0.3, 0.4] 4 271.0 337.0 283.0 348.0 258.0 377.0
如您所见,使用这种输出格式,您无需将“时间段”放在各列中。