如何使用熊猫分割数据框?

时间:2018-08-12 17:20:51

标签: regex python-3.x pandas dataframe split

我要处理以下数据框, DF

Name                City
Hat, Richards       Paris
Adams               New york
Tim, Mathews        Sanfrancisco
chris, Moya De      Las Vegas
kate, Moris         Atlanta
Grisham HA          Middleton
James, Tom, greval  Rome

我期望的数据框应为DF

Name         Last_name           City
Hat          Richards            Paris
             Adams               New york
Tim          Mathews             Sanfrancisco
chris        Moya De             Las Vegas
kate         Moris               Atlanta
             Grisham HA          Middleton
James, Tom   greval              Rome

应该在最后一个','上进行拆分,如果没有,则所有其他单词或短语应落在'Last_name'列中,并且'Name'列应保持空白。

3 个答案:

答案 0 :(得分:4)

str.splitn=-1一起使用(默认情况下,您可以更改所需的内容)

newdf=df.Name.str.split(', ',expand=True,n=1).ffill(1)
newdf.loc[newdf[0]==newdf[1],0]=''
newdf
Out[923]: 
       0          1
0    Hat   Richards
1             Adams
2    Tim    Mathews
3  chris     MoyaDe
4   kate      Moris
5         GrishamHA
df[['Name','LastName']]=newdf
df
Out[925]: 
    Name          City   LastName
0    Hat         Paris   Richards
1              Newyork      Adams
2    Tim  Sanfrancisco    Mathews
3  chris      LasVegas     MoyaDe
4   kate       Atlanta      Moris
5            Middleton  GrishamHA

答案 1 :(得分:4)

使用str.splitradd来添加,,最后添加str.lstrip

df[['first','last']] = df['Name'].radd(', ').str.rsplit(', ', n=1, expand=True)
df['first'] = df['first'].str.lstrip(', ')
print (df)
                 Name          City       first        last
0       Hat, Richards         Paris         Hat    Richards
1               Adams      New york                   Adams
2        Tim, Mathews  Sanfrancisco         Tim     Mathews
3      chris, Moya De     Las Vegas       chris     Moya De
4         kate, Moris       Atlanta        kate       Moris
5          Grisham HA     Middleton              Grisham HA
6  James, Tom, greval          Rome  James, Tom      greval

答案 2 :(得分:4)

快速和肮脏

pandas.str.splitstr[::-1]一起使用可以颠倒顺序

df[['Last_name', 'Name']] = df.Name.str.split(', ').str[::-1].apply(pd.Series)

df

    Name          City   Last_name
0    Hat         Paris    Richards
1    NaN      New york       Adams
2    Tim  Sanfrancisco     Mathews
3  chris     Las Vegas     Moya De
4   kate       Atlanta       Moris
5    NaN     Middleton  Grisham HA