熊猫 - 包含数组的列的未平坦数据框架

时间:2016-07-25 21:29:03

标签: python pandas

我有一个在特定属性上展平的数据框:

id      property_a    properties_b
id_1    property_a_1  [property_b_11, property_b_12]
id_2    property_a_2  [property_b_21, property_b_22, property_b_23]

..................

我想扩展专栏properties_b以返回到如下所示的数据框:

id      property_a    property_b
id_1    property_a_1  property_b_11
id_1    property_a_1  property_b_12
id_2    property_a_2  property_b_21
id_2    property_a_2  property_b_22
id_2    property_a_2  property_b_23

..................

我怀疑这对Pandas来说非常简单,但对于Python来说,我很难找到一种优雅的方法。

2 个答案:

答案 0 :(得分:3)

以下是另一种使用to_records,一些元组映射和from_records的方法。

import pandas as pd
import itertools

def expand_column(df, col_id):
    records = map(lambda r: [r[1:col_id] + (l,) + r[col_id + 1:] for l in r[col_id]], map(tuple, df.to_records()))
    return pd.DataFrame.from_records(itertools.chain.from_iterable(records), columns=df.columns)

df = pd.DataFrame([['a', [1,2,3], 'a'],['b', [4,5], 'b']], columns=['C1', 'L', 'C2'])

print(df)
print(expand_column(df, 2))

#   C1          L C2
# 0  a  [1, 2, 3]  a
# 1  b     [4, 5]  b
#
#   C1  L C2
# 0  a  1  a
# 1  a  2  a
# 2  a  3  a
# 3  b  4  b
# 4  b  5  b

答案 1 :(得分:2)

此问题已针对herehere。如果您发现这些问题和答案有用,请随时投票。

设置

df = pd.DataFrame([
        ['id_1', 'property_a_1', ['property_b_11', 'property_b_12']],
        ['id_2', 'property_a_2', ['property_b_21', 'property_b_22', 'property_b_23']],
    ], columns=['id', 'property_a', 'properties_b'])

df

enter image description here

rows = []
for i, row in df.iterrows():
    for a in row.properties_b:
        row.properties_b = a
        rows.append(row)

pd.DataFrame(rows, columns=df.columns)

enter image description here

便利功能

def loc_expand(df, loc):
    rows = []
    for i, row in df.iterrows():
        vs = row.at[loc]
        new = row.copy()
        for v in vs:
            new.at[loc] = v
            rows.append(new)

    return pd.DataFrame(rows)

def iloc_expand(df, iloc):
    rows = []
    for i, row in df.iterrows():
        vs = row.iat[iloc]
        new = row.copy()
        for v in vs:
            row.iat[iloc] = v
            rows.append(row)

    return pd.DataFrame(rows)

这些都应该返回与上面相同的结果。

loc_expand(df, 'properties_b')
iloc_expand(df, 2)