我有一个数据框,其中包含20个左右的列。其中一列称为'director_name',其值为'John Doe'或'Jane Doe'。我想将其拆分为2列,'First_Name'和'Last_Name'。当我运行以下操作时,它按预期工作,并将字符串拆分为2列:
data[['First_Name', 'Last_Name']] = data.director_name.str.split(' ', expand
= True)
data
First_Name Last_Name
John Doe
它工作得很好,但是当我在'director_name'下有NULL(NaN)值时它不起作用。它会引发以下错误:
'Columns must be same length as key'
我想添加一个检查值是否为!= null的函数,然后执行上面列出的命令,否则为First_Name和'Last_Name'输入'NA'
有什么想法我会怎么做?
编辑:
我刚检查了文件,我不确定NULL是否是问题。我有一些长3-4个字符串的名字。即。
John Allen Doe
John Allen Doe Jr
也许我无法将其拆分为First_Name和Last_Name。
Hmmmm
答案 0 :(得分:7)
这是一种方法是拆分并选择说出前两个值作为名字和姓氏
Id name
0 1 James Cameron
1 2 Martin Sheen
2 3 John Allen Doe
3 4 NaN
df['First_Name'] = df.name.str.split(' ', expand = True)[0]
df['Last_Name'] = df.name.str.split(' ', expand = True)[1]
你得到了
Id name First_Name Last_Name
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen
2 3 John Allen Doe John Allen
3 4 NaN NaN None
答案 1 :(得分:2)
按位置使用str.split
(无参数,因为默认情况下拆分器为空格)和indexing with str用于选择列表:
print (df.name.str.split())
0 [James, Cameron]
1 [Martin, Sheen]
2 [John, Allen, Doe]
3 NaN
Name: name, dtype: object
df['First_Name'] = df.name.str.split().str[0]
df['Last_Name'] = df.name.str.split().str[1]
#data borrow from A-Za-z answer
print (df)
Id name First_Name Last_Name
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen
2 3 John Allen Doe John Allen
3 4 NaN NaN NaN
还可以使用参数n
来选择第二个或前两个名称:
df['First_Name'] = df.name.str.split().str[0]
df['Last_Name'] = df.name.str.split(n=1).str[1]
print (df)
Id name First_Name Last_Name
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen
2 3 John Allen Doe John Allen Doe
3 4 NaN NaN NaN
的解决方案
df['First_Name'] = df.name.str.rsplit(n=1).str[0]
df['Last_Name'] = df.name.str.rsplit().str[-1]
print (df)
Id name First_Name Last_Name
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen
2 3 John Allen Doe John Allen Doe
3 4 NaN NaN NaN
答案 2 :(得分:1)
这应该可以解决您的问题
<强>设置强>
data= pd.DataFrame({'director_name': {0: 'John Doe', 1: np.nan, 2: 'Alan Smith'}})
data
Out[457]:
director_name
0 John Doe
1 NaN
2 Alan Smith
<强>解决方案强>
#use a lambda function to check nan before splitting the column.
data[['First_Name', 'Last_Name']] = data.apply(lambda x: pd.Series([np.nan,np.nan] if pd.isnull(x.director_name) else x.director_name.split()), axis=1)
data
Out[446]:
director_name First_Name Last_Name
0 John Doe John Doe
1 NaN NaN NaN
2 Alan Smith Alan Smith
如果您只需要前两个名字,您可以这样做:
data[['First_Name', 'Last_Name']] = data.apply(lambda x: pd.Series([np.nan,np.nan] if pd.isnull(x.director_name) else x.director_name.split()).iloc[:2], axis=1)
答案 3 :(得分:1)
df['First_Name'] = df.name.str.split(' ', expand = True)[0]
df['Last_Name'] = df.name.str.split(' ', expand = True)[1]
这应该