我有一个df,其中的列之一如下所示:
**Share**
We are safe 25%
We are always safe 12.50% (India Aus, West)
We are ok (USA, EU)
We are not OK
What is this
Always wise 25.66%
我想拆分此列,以便将%值在任何适用的情况下从该列拆分为一个新的值。 因此输出将是
Share Percent LOCATION
We are safe 25%
We are always safe 12.50% India Aus, West
We are ok USA, EU
We are not OK
What is this
Always wise 25.66%
我认为以下内容会将其从右侧拆分,但是它不起作用
df['Percent'] = df['Share'].str.rsplit(r' \d',1).str[0]
答案 0 :(得分:3)
您可以提取这些值:
df[['Share','Percent']] = df['Share'].str.split(r'\s+(?=\d+(?:\.\d+)?%\s*$)',expand=True).fillna("")
熊猫测试:
import pandas as pd
df = pd.DataFrame({'Share':['We are safe 25%','We are ok', 'We are always safe 12.50%']})
df[['Share','Percent']] = df['Share'].str.split(r'\s+(?=\d+(?:\.\d+)?%\s*$)',expand=True).fillna("")
>>> df
Share Percent
0 We are safe 25%
1 We are ok
2 We are always safe 12.50%
请参见regex demo。详细信息:
\s+
-一个或多个空格(?=\d+(?:\.\d+)?%\s*$)
-与位置相匹配的正向超前,紧随其后的是:
\d+
-一个或多个数字(?:\.\d+)?
-.
和一个或多个数字的可选序列%
-一个%
符号\s*
-零个或多个尾随空格(后跟$
)和$
-字符串的结尾。