Question

我有一个这样的DataFrame （熊猫版本0.23.4）

Emp   Factors         Comments        Action     ActionText
1   "['1','1']"  "["not","some"]"    "['1']"    "['good','as']"
2   "['1']"      "['textB']"          "[]"       "['da']"

我不能使用

df.set_index('Emp').apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)

，因为它并不总是唯一的。 PS：因子，注释，操作，ActionText的类型在df中为 String 。 对于该列的多个条目（[，]）的任何值，我想要输出DF中的新行。 我希望输出df像

Emp Factors Comments Action   ActionText
1    1        not     1         good
1    1        some    1         as
2    1        textB   ""or nan  da

Answer 1

`zip_longest`

from itertools import zip_longest

data = [
    (emp, *tup)
    for emp, *other in df.itertuples(index=False)
    for tup in zip_longest(*other)
]

pd.DataFrame(data, columns=df.columns)

   Emp Factors Comments Action ActionText
0    1       1      not      1       good
1    1       1     some   None         as
2    2       1    textB   None         da

设置

我假设df是：

df = pd.DataFrame([
    [1, ['1', '1'], ['not', 'some'], ['1'], ['good', 'as']],
    [2, ['1'], ['textB'], [], ['da']]
], columns=['Emp', 'Factors', 'Comments', 'Action', 'ActionText'])

Answer 2

我必须将Comments 1的Emp固定到"['not','some']"中，以使其可解析。然后，我使用了2个实用程序函数：第一个将字符串转换为列表，第二个用于处理原始数据帧的行。

修复后，数据框为：

df = pd.DataFrame({'Emp': [1, 2], 'Factors': ['"[\'1\',\'1\']"', '"[\'1\']"'],
                   'Comments': ['"[\'not\',\'some\']"', '"[\'textB\']"'],
                   'Action': ['"[\'1\']"', '"[]"'],
                   'ActionText': ['"[\'good\',\'as\']"', '"[\'da\']"']})

或

   Emp      Factors          Comments   Action       ActionText
0    1  "['1','1']"  "['not','some']"  "['1']"  "['good','as']"
1    2      "['1']"       "['textB']"     "[]"         "['da']"

我的代码是：

def do_eval(x):
    if not isinstance(x, str): return x
    while(isinstance(x, str)):
        x = ast.literal_eval(x)
    return x if len(x) > 1 else x[0] if len(x) == 1 else None

def make_df(row):
    d = row.apply(do_eval).to_dict()
    for v in d.values():
        if isinstance(v, list):
            return pd.DataFrame(d)
    return pd.DataFrame(d, index=[0])

resul = pd.concat(df.apply(make_df, axis=1).values)

它给出：

   Emp Factors Comments Action ActionText
0    1       1      not      1       good
1    1       1     some      1         as
0    2       1    textB   None         da

在熊猫中使用具有字符串值的数据框爆炸而没有爆炸功能

2 个答案:

`zip_longest`

设置