正则表达式匹配多个前缀,解压到列

时间:2018-09-10 14:10:33

标签: python regex pandas

任何有关正则表达式的建议都可以用于本系列

import pandas as pd
import numpy as np

data = [
    'Apple: very tasty',
    'Banana: Unpleasant',
    'Apple: quite nice  Banana: not bad either',
    '',
]

ser = pd.Series(data=data)

enter image description here

添加到此结果DataFrame中?

pd.DataFrame(data=[
    ['very tasty', np.nan],
    [np.nan, 'Unpleasant'],
    ['quite nice', 'not bad either'],
    [np.nan, np.nan],
], columns = ['Apple', 'Banana'])

enter image description here

如果存在Apple和Banana,则它们始终按Apple,Banana的顺序排列,并以 double 空格隔开。

1 个答案:

答案 0 :(得分:1)

您可以执行以下操作:

df_out = pd.DataFrame(df.values.reshape(-1,2),
                      index=np.repeat(np.arange(df.shape[0]),df.shape[1]//2))

df_out = pd.DataFrame()

df = ser.str.split(':| \ s \ s',expand = True)

在df.groupby中的n,g(df.columns // 2,轴= 1):

df_out = pd.concat([df_out,pd.DataFrame(g.values)])

df_out.set_index(0, append=True)[1].unstack().dropna(1, how='all')

输出:

         Apple           Banana
0   very tasty              NaN
1          NaN       Unpleasant
2   quite nice   not bad either
3          NaN              NaN