我有一个数据帧df:
import pandas as pd
df = pd.DataFrame([
[[[3,0.5, 0.4, 0.7, 5],[2, 0.5, 1, 0.8, 2],[1, 0.5, 1, 1, 2]], 'b'],
[[[1, 0.5, 0.6, 0.01, 1],[2, 0.5, 0.3, 0.2, 3],[1, 0.8, 1.0, 0.04, 3]], 'd']],
index = ['row1', 'row2'],
columns=['col1', 'col2'])
我想将col1(包括列表列表)拆分为多行,如下所示:
col1 col2
row1 [3,0.5, 0.4, 0.7, 5] b
row1 [2, 0.5, 1, 0.8, 2] b
row1 [1, 0.5, 1, 1, 2] b
row2 [1, 0.5, 0.6, 0.01, 1] d
row2 [2, 0.5, 0.3, 0.2, 3] d
row2 [1, 0.8, 1.0, 0.04, 3] d
然后将col1分成2列,仅保留第二个和第三个元素
new_col1 new_col2 col2
row1 0.5 0.4 b
row1 0.5 1 b
row1 0.5 1 b
row2 0.5 0.6 d
row2 0.5 0.3 d
row2 0.8 1.0 d
如何使用pandas做到这一点?
答案 0 :(得分:0)
第一步可能没有比循环更好的东西:
df2 = pd.DataFrame()
for row in df.index:
col = df.ix[row, 'col1']
N = len(col)
df2 = df2.append(pd.DataFrame(
[[c, df.ix[row, 'col2']] for c in col],
index=[row] * N,
columns = ['col1', 'col2']))
对于第二步,只需添加新列并删除原始列:
df3 = df2.copy()
df3['new_col1'] = [c[1] for c in df3['col1']]
df3['new_col2'] = [c[2] for c in df3['col1']]
del df3['col1']