用%拆分熊猫列

时间:2020-10-13 12:08:00

标签: python-3.x regex

我有一个df,其中的列之一如下所示:

**Share**
We are safe 25%
We are always safe 12.50% (India Aus, West)
We are ok (USA, EU)
We are not OK
What is this
Always wise 25.66%

我想拆分此列,以便将%值在任何适用的情况下从该列拆分为一个新的值。 因此输出将是

Share                  Percent    LOCATION
We are safe            25%  
We are always safe     12.50%     India Aus, West
We are ok                         USA, EU
We are not OK
What is this
Always wise            25.66%

我认为以下内容会将其从右侧拆分,但是它不起作用

df['Percent'] = df['Share'].str.rsplit(r' \d',1).str[0]

1 个答案:

答案 0 :(得分:3)

您可以提取这些值:

df[['Share','Percent']] = df['Share'].str.split(r'\s+(?=\d+(?:\.\d+)?%\s*$)',expand=True).fillna("")

熊猫测试:

import pandas as pd
df = pd.DataFrame({'Share':['We are safe 25%','We are ok', 'We are always safe 12.50%']})
df[['Share','Percent']] = df['Share'].str.split(r'\s+(?=\d+(?:\.\d+)?%\s*$)',expand=True).fillna("")
>>> df
                Share Percent
0         We are safe     25%
1           We are ok        
2  We are always safe  12.50%

请参见regex demo。详细信息:

  • \s+-一个或多个空格
  • (?=\d+(?:\.\d+)?%\s*$)-与位置相匹配的正向超前,紧随其后的是:
    • \d+-一个或多个数字
    • (?:\.\d+)?-.和一个或多个数字的可选序列
    • %-一个%符号
    • \s*-零个或多个尾随空格(后跟$)和
    • $-字符串的结尾。