大熊猫从列中提取数据到字符串中

时间:2019-12-08 11:45:42

标签: python pandas

我有以下数据框:

     IP               Service           Status     CPU        Memory
0   10.58.1.73   service: StorageService   null   cpu: 22%   memory: 11%
0   10.58.1.99   service: StorageService   null   cpu: 25%   memory: 37%
0  10.58.1.114   service: StorageService   null   cpu: 39%    memory: 2%
0   10.58.1.82   service: StorageService   null   cpu: 50%   memory: 96%
0   10.58.1.53   service: StorageService   null   cpu: 29%   memory: 36%
0    10.58.1.1   service: StorageService   null   cpu: 54%    memory: 6%
0   10.58.1.15   service: StorageService   null   cpu: 28%   memory: 30%
0    10.58.1.4   service: StorageService   null    cpu: 5%   memory: 48%
0   10.58.1.69   service: StorageService   null   cpu: 21%   memory: 57%
0    10.58.1.5   service: StorageService   null    cpu: 4%    memory: 2%
0  10.58.1.136   service: StorageService   null   cpu: 98%   memory: 74%
0   10.58.1.43   service: StorageService   null   cpu: 36%   memory: 23%
0    10.58.1.6   service: StorageService   null   cpu: 61%   memory: 25%
0  10.58.1.137   service: StorageService   null   cpu: 76%   memory: 66%
0   10.58.1.83   service: StorageService   null   cpu: 92%   memory: 35%
0   10.58.1.39   service: StorageService   null   cpu: 35%   memory: 17%

我需要将CPU列提取为字符串。我尝试使用此命令:

cpu = df2.CPU.str.extract(r'([\d]+))', expand=False)

但是我认为我的RegEx已关闭。解决这个问题的最佳方法是什么?

3 个答案:

答案 0 :(得分:2)

考虑一个常见的cpu:前缀-简单替换即可完成:

df2.CPU.str.replace('cpu: ', '').str[:-1]

或更简单的切片

df2.CPU.str[5:-1] 

答案 1 :(得分:1)

错误消息告诉您正则表达式中的细微错误在哪里,这是一个多余的右括号:

re.error: unbalanced parenthesis at position 7

df.CPU.str.extract(r'([\d]+))', expand=False)
                            ^

您打算输入:

df.CPU.str.extract(r'([\d]+)', expand=False)

效果很好。

答案 2 :(得分:0)

您可以在此处获取arround正则表达式。我建议:

df2.CPU.str.split(' ').str[1]

这将在空格字符处分割字符串,并选择第二个元素,即百分比。