我的数据框中有一个列,如下所示:
输入
df['location.display_name']
输出
Kelso, Scottish Borders
Manchester, Greater Manchester
Northampton, Northamptonshire
Reading, Berkshire
Leicester, Leicestershire
Newport, Wales
Swindon, Wiltshire
Perth, Perth & Kinross
Manchester, Greater Manchester
Perth, Perth & Kinross
Cardiff
Hull, East Riding Of Yorkshire
Chester, Cheshire
Southampton
Leamington Spa, Warwickshire
Swindon, Wiltshire
Slough, Berkshire
Portsmouth, Hampshire
我想创建一个只包含该位置第一部分的新列 - 例如:Swindon,Wiltshire我想保留Swindon并将其添加到新列中。
另外,这会影响我想保留的一些只有Cardiff
这样的单词?
答案 0 :(得分:1)
我认为需要split
,list
选择第一个str[0]
或[0]
选择第一列:
df['new'] = df['location.display_name'].str.split(',').str[0]
#alternative
#df['new'] = df['location.display_name'].str.split(',', expand=True)[0]
print (df)
location.display_name new
0 Kelso, Scottish Borders Kelso
1 Manchester, Greater Manchester Manchester
2 Northampton, Northamptonshire Northampton
3 Reading, Berkshire Reading
4 Leicester, Leicestershire Leicester
5 Newport, Wales Newport
6 Swindon, Wiltshire Swindon
7 Perth, Perth & Kinross Perth
8 Manchester, Greater Manchester Manchester
9 Perth, Perth & Kinross Perth
10 Cardiff Cardiff
11 Hull, East Riding Of Yorkshire Hull
12 Chester, Cheshire Chester
13 Southampton Southampton
14 Leamington Spa, Warwickshire Leamington Spa
15 Swindon, Wiltshire Swindon
16 Slough, Berkshire Slough
17 Portsmouth, Hampshire Portsmouth
如果数据中没有NaN
和None
,则可以使用list comprehension
:
df['new'] = [x.split(',')[0] for x in df['location.display_name']]
答案 1 :(得分:1)
要在您的每个元素列上执行自定义功能,您可以使用pandas apply
函数。在您的情况下,以下代码应该完成这项工作:
import pandas
import numpy
def get_first_substring(x):
if (x!=None and x!=numpy.nan):
return x.split(',')[0]
dataframe['new'] = dataframe['location.display_name'].apply(get_first_substring)
输出如下:
old new
subsstring1, subsstring2 subsstring1