删除pandas列中的部分字符串

时间:2018-03-27 10:30:47

标签: python pandas

我的数据框中有一个列,如下所示:

输入

df['location.display_name']

输出

 Kelso, Scottish Borders
 Manchester, Greater Manchester
 Northampton, Northamptonshire
 Reading, Berkshire
 Leicester, Leicestershire
 Newport, Wales
 Swindon, Wiltshire
 Perth, Perth & Kinross
 Manchester, Greater Manchester
 Perth, Perth & Kinross
 Cardiff
 Hull, East Riding Of Yorkshire
 Chester, Cheshire
 Southampton
 Leamington Spa, Warwickshire
 Swindon, Wiltshire
 Slough, Berkshire
 Portsmouth, Hampshire

我想创建一个只包含该位置第一部分的新列 - 例如:Swindon,Wiltshire我想保留Swindon并将其添加到新列中。

另外,这会影响我想保留的一些只有Cardiff这样的单词?

2 个答案:

答案 0 :(得分:1)

我认为需要splitlist选择第一个str[0][0]选择第一列:

df['new'] = df['location.display_name'].str.split(',').str[0]
#alternative
#df['new'] = df['location.display_name'].str.split(',', expand=True)[0]
print (df)
              location.display_name              new
0           Kelso, Scottish Borders            Kelso
1    Manchester, Greater Manchester       Manchester
2     Northampton, Northamptonshire      Northampton
3                Reading, Berkshire          Reading
4         Leicester, Leicestershire        Leicester
5                    Newport, Wales          Newport
6                Swindon, Wiltshire          Swindon
7            Perth, Perth & Kinross            Perth
8    Manchester, Greater Manchester       Manchester
9            Perth, Perth & Kinross            Perth
10                          Cardiff          Cardiff
11   Hull, East Riding Of Yorkshire             Hull
12                Chester, Cheshire          Chester
13                      Southampton      Southampton
14     Leamington Spa, Warwickshire   Leamington Spa
15               Swindon, Wiltshire          Swindon
16                Slough, Berkshire           Slough
17            Portsmouth, Hampshire       Portsmouth

如果数据中没有NaNNone,则可以使用list comprehension

df['new'] = [x.split(',')[0] for x in df['location.display_name']]

答案 1 :(得分:1)

要在您的每个元素列上执行自定义功能,您可以使用pandas apply函数。在您的情况下,以下代码应该完成这项工作:

import pandas
import numpy

def get_first_substring(x):
    if (x!=None and x!=numpy.nan):
        return x.split(',')[0]

dataframe['new'] = dataframe['location.display_name'].apply(get_first_substring)

输出如下:

          old                     new
subsstring1, subsstring2      subsstring1