拆分熊猫列

时间:2021-04-04 10:31:00

标签: python pandas

我有一个字符串列,我希望根据字符串将其拆分为三列。该列看起来像这样

full_string
x a b c
d e
m n o
y m n
y d e f
d e f

xy 是前缀。我想将此列转换为三列

prefix_string  first_string last_string
x              a            c
               d            e
               m            o
y              m            n
y              d            f
               d            f

我有这个代码

df['first_string'] = df[df['full_string'].str.split().str.len() == 2]['full_string'].str.split().str[0] 
df['first_string'] = df[df['full_string'].str.split().str.len() > 2]['full_string'].str.split().str[1]

df['last_string'] = df['full_string'].str.split().str[-1]

prefix_string = ['x', 'y'] 
df['prefix_string'] = df[df['full_string'].str.split().str[0].isin(prefix_string)]['full_string'].str.split().str[0]

此代码不适用于 first_string。有没有办法提取 first string 而不管 prefix_string 和字符串长度?

2 个答案:

答案 0 :(得分:1)

尝试使用 numpy.wherepandas.Series.str.split

import numpy as np

prefix_str = ["x", "y"]

res = df["full_string"].str.split(" ", expand=True).ffill(axis=1)
res["last_string"] = res.iloc[:, -1]
res["prefix_string"] = np.where(res[0].isin(prefix_str), res[0], "")
res["first_string"] = np.where(res["prefix_string"].ne(""), res[1], res[0])

res = res[["prefix_string", "first_string", "last_string"]]

输出:

  prefix_string first_string last_string
0             x            a           c
1                          d           e
2                          m           o
3             y            m           n
4             y            d           f
5                          d           f

答案 1 :(得分:0)

而不是上面代码中的这些行:

df['first_string'] = df[df['full_string'].str.split().str.len() == 2]['full_string'].str.split().str[0] 
df['first_string'] = df[df['full_string'].str.split().str.len() > 2]['full_string'].str.split().str[1]

利用split()contains()fillna()方法:

df['first_string']=df['full_string'].str.split(expand=True).loc[~df['full_string'].str.split(expand=True)[0].str.contains('x|y'),0]
df['first_string']=df['first_string'].fillna(df['full_string'].str.split(expand=True)[1])

df 的输出:

    full_string     first_string    last_string     prefix_string
0   x a b c             a               c                   x
1   d e                 d               e                   NaN
2   m n o               m               o                   NaN
3   y m n               m               n                   y
4   y d e f             d               f                   y
5   d e f               d               f                   NaN