Question

我想用“-”分隔数据框特定列的字符串，并将最后一部分保存到新列中。可以在df外运行：

s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'

print(s0.split(" - ")[-1])  # works
print(s1.split(" - ")[-1])
print(s2.split(" - ")[-1])

但不包含数据框：

df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].str.split(' - ')[-1]  # KeyError: -1
print(df['diagnosis'])

我在做什么错了？

Answer 1

pd.Series.str.rfind不是将字符串拆分为大块列表的一种方法：

In [104]: df['title'].apply(lambda s: s[s.rfind('-') + 1:].strip())                                         
Out[104]: 
0                Pharyngitis
1                Nephropathy
2    Metastatic Liver Cancer
Name: title, dtype: object

Answer 2

您可以在此处使用apply和lambda

s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'

df = pd.DataFrame([s0, s1, s2], columns=['title'])

df['diagnosis'] = df['title'].apply(lambda x: x.split(' - ')[-1]) 

print(df['diagnosis'])

打印：

0                Pharyngitis
1                Nephropathy
2    Metastatic Liver Cancer
Name: diagnosis, dtype: object

如果您喜欢一个空字符串（如果字符串中没有-，则将该行更改为：

df['diagnosis'] = df['title'].apply(lambda x: x.split(' - ')[-1] if ' - ' in x else '')

Answer 3

使函数执行返回值的工作，然后将其应用于该列。

import pandas as pd

s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'

def f(x):
    return x.split(" - ")[-1]

df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].apply(f) 
print(df['diagnosis'])

分割Python数据框字符串并将最后分割的部分保存到新列中

3 个答案: