我有一个这样的DataFrame (熊猫版本0.23.4)
Emp Factors Comments Action ActionText
1 "['1','1']" "["not","some"]" "['1']" "['good','as']"
2 "['1']" "['textB']" "[]" "['da']"
我不能使用
df.set_index('Emp').apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)
,因为它并不总是唯一的。 PS:因子,注释,操作,ActionText的类型在df中为 String 。 对于该列的多个条目([,])的任何值,我想要输出DF中的新行。 我希望输出df像
Emp Factors Comments Action ActionText
1 1 not 1 good
1 1 some 1 as
2 1 textB ""or nan da
答案 0 :(得分:2)
zip_longest
from itertools import zip_longest
data = [
(emp, *tup)
for emp, *other in df.itertuples(index=False)
for tup in zip_longest(*other)
]
pd.DataFrame(data, columns=df.columns)
Emp Factors Comments Action ActionText
0 1 1 not 1 good
1 1 1 some None as
2 2 1 textB None da
我假设df
是:
df = pd.DataFrame([
[1, ['1', '1'], ['not', 'some'], ['1'], ['good', 'as']],
[2, ['1'], ['textB'], [], ['da']]
], columns=['Emp', 'Factors', 'Comments', 'Action', 'ActionText'])
答案 1 :(得分:1)
我必须将Comments
1的Emp
固定到"['not','some']"
中,以使其可解析。然后,我使用了2个实用程序函数:第一个将字符串转换为列表,第二个用于处理原始数据帧的行。
修复后,数据框为:
df = pd.DataFrame({'Emp': [1, 2], 'Factors': ['"[\'1\',\'1\']"', '"[\'1\']"'],
'Comments': ['"[\'not\',\'some\']"', '"[\'textB\']"'],
'Action': ['"[\'1\']"', '"[]"'],
'ActionText': ['"[\'good\',\'as\']"', '"[\'da\']"']})
或
Emp Factors Comments Action ActionText
0 1 "['1','1']" "['not','some']" "['1']" "['good','as']"
1 2 "['1']" "['textB']" "[]" "['da']"
我的代码是:
def do_eval(x):
if not isinstance(x, str): return x
while(isinstance(x, str)):
x = ast.literal_eval(x)
return x if len(x) > 1 else x[0] if len(x) == 1 else None
def make_df(row):
d = row.apply(do_eval).to_dict()
for v in d.values():
if isinstance(v, list):
return pd.DataFrame(d)
return pd.DataFrame(d, index=[0])
resul = pd.concat(df.apply(make_df, axis=1).values)
它给出:
Emp Factors Comments Action ActionText
0 1 1 not 1 good
1 1 1 some 1 as
0 2 1 textB None da