嗨,我有一个像这样的数据框:
A B
0: a [[L1, L2]]
1: b [[L1, L2, L3]]
我想将其更改为:
A B C
0: a [[L1, L2]] L1
1: a [[L1, L2]] L2
2: b [[L1, L2, L3]] L1
3: b [[L1, L2, L3]] L2
4: b [[L1, L2, L3]] L3
我该怎么做?
答案 0 :(得分:0)
尝试这样:
import pandas as pd
from io import StringIO
data = """
A B
a [[L1,L2]]
b [[L1,L2,L3]]
"""
df = pd.read_csv(StringIO(data),sep=' ')
df['C']=df['B']
df['C']=df.C.astype(str).replace(['\[','\]', "'", "\s+"], '', regex=True)
print(df.set_index(df.columns.drop('C',1).tolist()).C.str.split(',', expand=True).stack().reset_index().rename(columns={0:'C'}).loc[:, df.columns])
结果:
A B C
0 a [[L1,L2]] L1
1 a [[L1,L2]] L2
2 b [[L1,L2,L3]] L1
3 b [[L1,L2,L3]] L2
4 b [[L1,L2,L3]] L3
答案 1 :(得分:0)
使用itertools.chain
的一种解决方案:
import pandas as pd
from itertools import chain
# old dataframe:
df = pd.DataFrame({'A': ['a', 'b'],
'B': [ [['L1', 'L2']], [['L1', 'L2', 'L3']] ]})
d = {'A':[], 'B':[], 'C': []}
for a, b in zip(df['A'], df['B']):
for c in chain.from_iterable(b):
d['A'].append(a)
d['B'].append(b)
d['C'].append(c)
# new dataframe:
df = pd.DataFrame(d)
print(df)
打印:
A B C
0 a [[L1, L2]] L1
1 a [[L1, L2]] L2
2 b [[L1, L2, L3]] L1
3 b [[L1, L2, L3]] L2
4 b [[L1, L2, L3]] L3