我想用“-”分隔数据框特定列的字符串,并将最后一部分保存到新列中。可以在df外运行:
s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'
print(s0.split(" - ")[-1]) # works
print(s1.split(" - ")[-1])
print(s2.split(" - ")[-1])
但不包含数据框:
df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].str.split(' - ')[-1] # KeyError: -1
print(df['diagnosis'])
我在做什么错了?
答案 0 :(得分:3)
pd.Series.str.rfind
不是将字符串拆分为大块列表的一种方法:
In [104]: df['title'].apply(lambda s: s[s.rfind('-') + 1:].strip())
Out[104]:
0 Pharyngitis
1 Nephropathy
2 Metastatic Liver Cancer
Name: title, dtype: object
答案 1 :(得分:1)
您可以在此处使用apply
和lambda
s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'
df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].apply(lambda x: x.split(' - ')[-1])
print(df['diagnosis'])
打印:
0 Pharyngitis
1 Nephropathy
2 Metastatic Liver Cancer
Name: diagnosis, dtype: object
如果您喜欢一个空字符串(如果字符串中没有-
,则将该行更改为:
df['diagnosis'] = df['title'].apply(lambda x: x.split(' - ')[-1] if ' - ' in x else '')
答案 2 :(得分:1)
使函数执行返回值的工作,然后将其应用于该列。
import pandas as pd
s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'
def f(x):
return x.split(" - ")[-1]
df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].apply(f)
print(df['diagnosis'])