这似乎很简单,但我无法弄清楚。我有一列“ c_id”,它是数据库中设备的ID。它与每个记录相关联。每当数据返回告诉我有关可能的恶意软件问题时,都会添加c_id,以便我们知道受影响的设备。 我有下面显示的代码,但是我希望能够按c_id运行这些百分比。换句话说,如果我有250个唯一的c_id,我希望有250个百分比。由于某些原因,我不知道该如何实现。有可能吗?
当前,我有以下工作代码:
ps = pd.read_sql('SELECT * FROM plug_results WHERE DATE_SUB(CURDATE(),INTERVAL 30 day) ORDER BY c_id', con=connection)
#ps stands for pandas_sql
s = ps.malware_tests_status
counts = s.value_counts()
percent = s.value_counts(normalize=True)
percent100 = s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'
pr = pd.DataFrame({'counts': counts, 'per': percent, 'per100': percent100})
OUTPUT:
counts per per100
ok 137482 0.738618 73.9%
informational 41210 0.221400 22.1%
warning 7442 0.039982 4.0%
为澄清起见,当前代码正在获取所有c_id的百分比,而不是每个单独的c_id的百分比
以下是数据示例:
c_id date_time time_log malware_tests_uid malware_tests_name malware_tests_status ...
0 XXXXXXXX-A 2019-12-13 11:20:23 pr_0bdcd74073 Malware Tests ok
1 XXXXXXXX-A 2019-12-13 11:30:21 pr_0bdcd74073 Malware Tests ok
2 XXXXXXXX-A 2019-12-13 11:40:21 pr_0bdcd74073 Malware Tests ok
3 XXXXXXXX-A 2019-12-13 11:50:24 pr_0bdcd74073 Malware Tests informational
4 XXXXXXXX-A 2019-12-13 12:00:22 pr_0bdcd74073 Malware Tests ok
5 XXXXXXXX-B 2019-12-13 12:10:22 pr_0bdcd74073 Malware Tests ok
6 XXXXXXXX-B 2019-12-13 12:20:21 pr_0bdcd74073 Malware Tests ok
7 XXXXXXXX-B 2019-12-13 12:30:21 pr_0bdcd74073 Malware Tests warning
8 XXXXXXXX-C 2019-12-13 12:40:05 pr_0bdcd74073 Malware Tests informational
9 XXXXXXXX-C 2019-12-13 12:40:08 pr_0bdcd74073 Malware Tests ok