在DataFrame中拆分必要的行

时间:2018-08-17 14:06:55

标签: pandas dataframe split

我有桌子:

                                   Name1 Name2 Name3
0                                    ABC   FGD   NNY
1  111S  PC  1T  Trees are always yellow   NaN   NaN
2                                      P   FGD   NNY
3                                    JJJ   FGD   NNY
4  111S  PC  1T  Trees are always yellow   NaN   NaN
5                                    ABC   FGD   NNY
6                                    UIK    GJ    DE

我想得到这个:

  Name1 Name2 Name3                    Name4
0   ABC   FGD   NNY                      NaN
1  111S    PC    1T  Trees are always yellow
2     P   FGD   NNY                      NaN
3   JJJ   FGD   NNY                      NaN
4  111S    PC    1T  Trees are always yellow
5   ABC   FGD   NNY                      NaN
6   UIK    GJ    DE                      NaN

我只需要拆分一些行,而其他行则不应更改。 我能够确定需要拆分数据的行:

if df[colname1].isnull:
    df_index=df[df[colname1].isnull()].index
    print(df_index)

现在需要在字符串中分隔值。我得到这样的东西:

if df[colname1].isnull:
df_index=df[df[colname1].isnull()].index
print(df_index)

for i in df_index:
    print(i)
    df1=df[colname][i].split('     ')

df1是具有我所需信息的字符串,但是我不知道如何将此信息放入需要索引的DataFrame df中。 您能帮我吗?

2 个答案:

答案 0 :(得分:1)

str.splitn一起使用

s=df.fillna('').apply('  '.join,1)
s.str.split('  ',n=3)
Out[189]: 
0                                [ABC, FGD, NNY]
1    [111S, PC, 1T, Trees are always yellow    ]
2                                  [P, FGD, NNY]
3                                [JJJ, FGD, NNY]
4    [111S, PC, 1T, Trees are always yellow    ]
5                                [ABC, FGD, NNY]
6                                  [UIK, GJ, DE]
dtype: object
pd.DataFrame(s.str.split('  ',n=3).tolist())
Out[190]: 
      0    1    2                            3
0   ABC  FGD  NNY                         None
1  111S   PC   1T  Trees are always yellow    
2     P  FGD  NNY                         None
3   JJJ  FGD  NNY                         None
4  111S   PC   1T  Trees are always yellow    
5   ABC  FGD  NNY                         None
6   UIK   GJ   DE                         None

答案 1 :(得分:0)

IIUC,您有一个双空格来分隔列,在句子中有一个空格。您可以使用它来执行拆分。

idx = df.loc[df.Name2.isnull()].index
df['Name4'] = np.nan
df.loc[idx] = df.loc[idx].Name1.str.split('  ',expand = True).values

    Name1   Name2   Name3   Name4
0   ABC     FGD     NNY     NaN
1   111S    PC      1T      Trees are always yellow
2   P       FGD     NNY     NaN
3   JJJ     FGD     NNY     NaN
4   111S    PC      1T      Trees are always yellow
5   ABC     FGD     NNY     NaN
6   UIK     GJ      DE      NaN