Question

我有以下数据框：

     IP               Service           Status     CPU        Memory
0   10.58.1.73   service: StorageService   null   cpu: 22%   memory: 11%
0   10.58.1.99   service: StorageService   null   cpu: 25%   memory: 37%
0  10.58.1.114   service: StorageService   null   cpu: 39%    memory: 2%
0   10.58.1.82   service: StorageService   null   cpu: 50%   memory: 96%
0   10.58.1.53   service: StorageService   null   cpu: 29%   memory: 36%
0    10.58.1.1   service: StorageService   null   cpu: 54%    memory: 6%
0   10.58.1.15   service: StorageService   null   cpu: 28%   memory: 30%
0    10.58.1.4   service: StorageService   null    cpu: 5%   memory: 48%
0   10.58.1.69   service: StorageService   null   cpu: 21%   memory: 57%
0    10.58.1.5   service: StorageService   null    cpu: 4%    memory: 2%
0  10.58.1.136   service: StorageService   null   cpu: 98%   memory: 74%
0   10.58.1.43   service: StorageService   null   cpu: 36%   memory: 23%
0    10.58.1.6   service: StorageService   null   cpu: 61%   memory: 25%
0  10.58.1.137   service: StorageService   null   cpu: 76%   memory: 66%
0   10.58.1.83   service: StorageService   null   cpu: 92%   memory: 35%
0   10.58.1.39   service: StorageService   null   cpu: 35%   memory: 17%

我需要将CPU列提取为字符串。我尝试使用此命令：

cpu = df2.CPU.str.extract(r'([\d]+))', expand=False)

但是我认为我的RegEx已关闭。解决这个问题的最佳方法是什么？

Answer 1

考虑一个常见的cpu:前缀-简单替换即可完成：

df2.CPU.str.replace('cpu: ', '').str[:-1]

或更简单的切片：

df2.CPU.str[5:-1]

Answer 2

错误消息告诉您正则表达式中的细微错误在哪里，这是一个多余的右括号：

re.error: unbalanced parenthesis at position 7

df.CPU.str.extract(r'([\d]+))', expand=False)
                            ^

您打算输入：

df.CPU.str.extract(r'([\d]+)', expand=False)

效果很好。

Answer 3

您可以在此处获取arround正则表达式。我建议：

df2.CPU.str.split(' ').str[1]

这将在空格字符处分割字符串，并选择第二个元素，即百分比。

大熊猫从列中提取数据到字符串中

3 个答案: