任何有关正则表达式的建议都可以用于本系列
import pandas as pd
import numpy as np
data = [
'Apple: very tasty',
'Banana: Unpleasant',
'Apple: quite nice Banana: not bad either',
'',
]
ser = pd.Series(data=data)
添加到此结果DataFrame中?
pd.DataFrame(data=[
['very tasty', np.nan],
[np.nan, 'Unpleasant'],
['quite nice', 'not bad either'],
[np.nan, np.nan],
], columns = ['Apple', 'Banana'])
如果存在Apple和Banana,则它们始终按Apple,Banana的顺序排列,并以 double 空格隔开。
答案 0 :(得分:1)
您可以执行以下操作:
df_out = pd.DataFrame(df.values.reshape(-1,2),
index=np.repeat(np.arange(df.shape[0]),df.shape[1]//2))
df_out = pd.DataFrame()
df = ser.str.split(':| \ s \ s',expand = True)
在df.groupby中的n,g(df.columns // 2,轴= 1):
df_out = pd.concat([df_out,pd.DataFrame(g.values)])
df_out.set_index(0, append=True)[1].unstack().dropna(1, how='all')
输出:
Apple Banana
0 very tasty NaN
1 NaN Unpleasant
2 quite nice not bad either
3 NaN NaN