在熊猫中使用具有字符串值的数据框爆炸而没有爆炸功能

时间:2020-03-23 14:56:12

标签: python pandas pandas-groupby

我有一个这样的DataFrame (熊猫版本0.23.4)

Emp   Factors         Comments        Action     ActionText
1   "['1','1']"  "["not","some"]"    "['1']"    "['good','as']"
2   "['1']"      "['textB']"          "[]"       "['da']"

我不能使用

df.set_index('Emp').apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)

,因为它并不总是唯一的。 PS:因子,注释,操作,ActionText的类型在df中为 String 对于该列的多个条目([,])的任何值,我想要输出DF中的新行。 我希望输出df像

Emp Factors Comments Action   ActionText
1    1        not     1         good
1    1        some    1         as
2    1        textB   ""or nan  da      

2 个答案:

答案 0 :(得分:2)

zip_longest

from itertools import zip_longest

data = [
    (emp, *tup)
    for emp, *other in df.itertuples(index=False)
    for tup in zip_longest(*other)
]

pd.DataFrame(data, columns=df.columns)

   Emp Factors Comments Action ActionText
0    1       1      not      1       good
1    1       1     some   None         as
2    2       1    textB   None         da

设置

我假设df是:

df = pd.DataFrame([
    [1, ['1', '1'], ['not', 'some'], ['1'], ['good', 'as']],
    [2, ['1'], ['textB'], [], ['da']]
], columns=['Emp', 'Factors', 'Comments', 'Action', 'ActionText'])

答案 1 :(得分:1)

我必须将Comments 1的Emp固定到"['not','some']"中,以使其可解析。然后,我使用了2个实用程序函数:第一个将字符串转换为列表,第二个用于处理原始数据帧的行。

修复后,数据框为:

df = pd.DataFrame({'Emp': [1, 2], 'Factors': ['"[\'1\',\'1\']"', '"[\'1\']"'],
                   'Comments': ['"[\'not\',\'some\']"', '"[\'textB\']"'],
                   'Action': ['"[\'1\']"', '"[]"'],
                   'ActionText': ['"[\'good\',\'as\']"', '"[\'da\']"']})

   Emp      Factors          Comments   Action       ActionText
0    1  "['1','1']"  "['not','some']"  "['1']"  "['good','as']"
1    2      "['1']"       "['textB']"     "[]"         "['da']"

我的代码是:

def do_eval(x):
    if not isinstance(x, str): return x
    while(isinstance(x, str)):
        x = ast.literal_eval(x)
    return x if len(x) > 1 else x[0] if len(x) == 1 else None

def make_df(row):
    d = row.apply(do_eval).to_dict()
    for v in d.values():
        if isinstance(v, list):
            return pd.DataFrame(d)
    return pd.DataFrame(d, index=[0])

resul = pd.concat(df.apply(make_df, axis=1).values)

它给出:

   Emp Factors Comments Action ActionText
0    1       1      not      1       good
1    1       1     some      1         as
0    2       1    textB   None         da