获取百分比最高的组

时间:2019-12-06 09:33:41

标签: python pandas anaconda data-science sensor

不确定标题是否有意义。虐待尝试详细说明。

我只是试图获得百分比值最高的传感器。 F.ex,我想要测量最高值的传感器中排名前10%的传感器。代码中有两次尝试;尝试通过测量的体积(累积的)来获取它,而另一个以小时为单位(产生错误消息)。

“总计”列是在时间戳之间测量的数量。我将永远感谢您的任何答复。

数据集/数据框当前如下所示:

Time DeviceEui Volume Total Day_Of_Week Day_Of_Month Year Month Day Hour 0 2019-11-12 09:50:22 0007090000CA1 3.822 0.013 1 30 2019 11 12 9 1 2019-11-12 09:51:35 000709000099F 16.473 0.008 1 30 2019 11 12 9 2 2019-11-12 09:51:41 0007090000CCE 33.170 0.000 1 30 2019 11 12 9 3 2019-11-12 09:51:48 00070900009A4 31.163 0.016 1 30 2019 11 12 9 4 2019-11-12 09:54:10 00070900009C9 7.030 0.026 1 30 2019 11 12 9 5 2019-11-12 09:55:46 0007090000CA6 31.621 0.001 1 30 2019 11 12 9 6 2019-11-12 09:56:53 00070900009CF 9.296 0.000 1 30 2019 11 12 9 7 2019-11-12 09:57:40 00070900009B1 48.864 0.041 1 30 2019 11 12 9 8 2019-11-12 09:58:17 0007090000145 33.384 0.006 1 30 2019 11 12 9 9 2019-11-12 10:00:17 0007090000CAB 12.458 0.003 1 30 2019 11 12 10 10 2019-11-12 10:00:56 0007090000CAE 25.885 0.000 1 30 2019 11 12 10 11 2019-11-12 10:01:54 0007090000983 34.486 0.001 1 30 2019 11 12 10 12 2019-11-12 10:02:10 00070900009D8 2.658 0.000 1 30 2019 11 12 10 13 2019-11-12 10:02:25 0007090000139 12.466 0.002 1 30 2019 11 12 10 14 2019-11-12 10:03:25 0007090000C98 4.062 0.030 1 30 2019 11 12 10 15 2019-11-12 10:08:30 0007090000C85 5.880 0.084 1 30 2019 11 12 10 16 2019-11-12 10:09:40 0007090000CA0 33.731 0.000 1 30 2019 11 12 10 17 2019-11-12 10:13:59 00070900009CB 5.684 0.000 1 30 2019 11 12 10 18 2019-11-12 10:15:02 0007090000151 3.673 0.027 1 30 2019 11 12 10 19 2019-11-12 10:15:32 0007090000CA5 9.718 0.013 1 30 2019 11 12 10

代码是这样的:

df = pd.read_csv('WatersensorWeek4Exported.csv')
pd.set_option('display.max_colwidth', -1)
df['Time'] =pd.to_datetime(df['Time'])
df['Day_Of_Week'] = df['Time'].dt.dayofweek
df['Day_Of_Month'] = df['Time'].dt.daysinmonth
df['Year'] = df['Time'].dt.year
df['Month'] = df['Time'].dt.month
df['Day'] = df['Time'].dt.day
df['Hour'] = df['Time'].dt.hour
t = df['Time']
df.apply(pd.to_numeric, errors=('ignore'))
df.fillna(0, inplace = True)
a = 0.1
print(df.head(20))
v_group = df.groupby('DeviceEui')
volume_sensor = v_group['Volume'].agg(np.max)
v_group.apply(lambda x: x.nlargest(int(len(x) * a), 'Volume')).agg(np.max)
print(v_group.describe)
for v, index in v_group:
    print(v, index)

print(v_group)

eui_group = df.groupby(['DeviceEui', 'Hour'])['Volume'].mean()
eui_group = eui_group.apply(lambda x: x.nlargest(int(len(x) * a), 'Volume'))
print(eui_group)
print(eui_group.dtypes)
for index, name in eui_group.iteritems():
    print(index,name)

代码产生的代码片段: [177 rows x 10 columns] 0007090000885
Time DeviceEui Volume Total Day_Of_Week Day_Of_Month Year Month Day Hour 44033 2019-11-28 06:55:30 0007090000885 0.000 0.000 3 30 2019 11 28 6 44034 2019-11-28 06:55:41 0007090000885 0.000 0.000 3 30 2019 11 28 6 44141 2019-11-28 07:55:30 0007090000885 0.000 0.000 3 30 2019 11 28 7 44142 2019-11-28 07:55:41 0007090000885 0.000 0.000 3 30 2019 11 28 7 44261 2019-11-28 08:55:30 0007090000885 0.011 0.011 3 30 2019 11 28 8 ... ... ... ... ... .. .. ... .. .. .. 60887 2019-12-04 03:56:49 0007090000885 0.971 0.000 2 31 2019 12 4 3 61000 2019-12-04 04:56:49 0007090000885 0.971 0.000 2 31 2019 12 4 4 61001 2019-12-04 04:56:49 0007090000885 0.971 0.000 2 31 2019 12 4 4 61108 2019-12-04 05:56:49 0007090000885 0.989 0.018 2 31 2019 12 4 5 61200 2019-12-04 06:56:49 0007090000885 1.005 0.016 2 31 2019 12 4 6
[195 rows x 10 columns] 0007090000FFF
Time DeviceEui Volume Total Day_Of_Week Day_Of_Month Year Month Day Hour 58167 2019-12-03 05:15:29 0007090000FFF 0.000 0.000 1 31 2019 12 3 5 58168 2019-12-03 05:15:39 0007090000FFF 0.000 0.000 1 31 2019 12 3 5 58274 2019-12-03 06:15:29 0007090000FFF 0.000 0.000 1 31 2019 12 3 6 58275 2019-12-03 06:15:39 0007090000FFF 0.000 0.000 1 31 2019 12 3 6 58392 2019-12-03 07:15:29 0007090000FFF 0.011 0.011 1 31 2019 12 3 7 58393 2019-12-03 07:15:39 0007090000FFF 0.011 0.011 1 31 2019 12 3 7

错误消息: 追溯(最近一次通话): 在第45行中输入文件“ C:\ Users \ xxx \ source \ repos \ MLwater \ MLwater \ForbruksNivå.py” eui_group = eui_group.apply(lambda x:x.nlargest(int(len(x)* a),'Volume')) 应用中的文件“ C:\ Users \ xxx \ Anaconda3 \ lib \ site-packages \ pandas \ core \ series.py”,行4042 映射= lib.map_infer(值,f,转换= convert_dtype) 在pandas._libs.lib.map_infer中的文件“ pandas_libs \ lib.pyx”,第2228行 文件“ C:\ Users \ xxx \ source \ repos \MLvannmålere\ MLwater \ Forbruks.py”,第45行,在 eui_group = eui_group.apply(lambda x:x.nlargest(int(len(x)* a),'Volume')) AttributeError:“ float”对象没有属性“ nlargest” 按任意键继续 。 。

0 个答案:

没有答案