我有
这样的数据框ID Series
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')]
1500 [('forgot data pages info', 0, 22, 'NP')]
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')]
我正在尝试将名为Series
的列中的文本解析为名为Series1
Series2
的不同列,以此类推,直到解析出的文本数量最多。
df_parsed = df['Series'].str[1:-1].str.split(', ', expand = True)
类似这样的东西:
ID Series Series1 Series2 Series3
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')] taxi instructions consistent basis the atc taxi clearance
1500 [('forgot data pages info', 0, 22, 'NP')] forgot data pages info
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')] hud correctly fotr approach
答案 0 :(得分:0)
最终结果的格式不容易理解,但是也许您可以按照这一概念来创建新列:
def process(ls):
return ' '.join([x[0] for x in ls])
df['Series_new'] = df['Series'].apply(lambda x: process(x))
如果您要创建N个新列(N = max_len(Series_list)
),我想您可以先计算N个。然后,按照上述概念并正确填写NaN以创建N个新列。