分割Python数据框字符串并将最后分割的部分保存到新列中

时间:2019-10-31 14:21:12

标签: python dataframe split

我想用“-”分隔数据框特定列的字符串,并将最后一部分保存到新列中。可以在df外运行:

s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'

print(s0.split(" - ")[-1])  # works
print(s1.split(" - ")[-1])
print(s2.split(" - ")[-1])

但不包含数据框:

df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].str.split(' - ')[-1]  # KeyError: -1
print(df['diagnosis'])

我在做什么错了?

3 个答案:

答案 0 :(得分:3)

pd.Series.str.rfind不是将字符串拆分为大块列表的一种方法:

In [104]: df['title'].apply(lambda s: s[s.rfind('-') + 1:].strip())                                         
Out[104]: 
0                Pharyngitis
1                Nephropathy
2    Metastatic Liver Cancer
Name: title, dtype: object

答案 1 :(得分:1)

您可以在此处使用applylambda

s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'

df = pd.DataFrame([s0, s1, s2], columns=['title'])

df['diagnosis'] = df['title'].apply(lambda x: x.split(' - ')[-1]) 

print(df['diagnosis'])

打印:

0                Pharyngitis
1                Nephropathy
2    Metastatic Liver Cancer
Name: diagnosis, dtype: object

如果您喜欢一个空字符串(如果字符串中没有-,则将该行更改为:

df['diagnosis'] = df['title'].apply(lambda x: x.split(' - ')[-1] if ' - ' in x else '')

答案 2 :(得分:1)

使函数执行返回值的工作,然后将其应用于该列。

import pandas as pd

s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'

def f(x):
    return x.split(" - ")[-1]

df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].apply(f) 
print(df['diagnosis'])