我必须在数据框下方:
import pandas as pd
a = pd.DataFrame([{"name": "John",
"item" : "item1||item2||item3",
"itemVal" : "item1Val||item2Val||item3Val"},
{"name" : "Tom",
"item":"item4",
"itemVal" : "item4Val"
}
])
数据框是这样的:
name item itemVal
John item1||item2||item3 item1Val||item2Val||item3Val
Tom item4 item4Val
我想将该行爆炸成多行,以使它像这样(注意item
及其itemVal
必须匹配)。
name item itemVal
John item1 item1Val
John item2 item2Val
John item3 item3Val
Tom item4 item4Val
我在这里还查看了其他答案:
Split (explode) pandas dataframe string entry to separate rows
pandas: How do I split text in a column into multiple rows?
但是作品只能在一栏上发表。如何使它在多列上工作?我正在使用Pandas 1.0.1和Python 3.8
答案 0 :(得分:3)
a = a.apply(lambda x: [v.split('||') for v in x]).apply(pd.Series.explode)
print(a)
打印:
name item itemVal
0 John item1 item1Val
0 John item2 item2Val
0 John item3 item3Val
1 Tom item4 item4Val
编辑:如果只想拆分选定的列,则可以执行以下操作:
exploded = a[['item', 'itemVal']].apply(lambda x: [v.split('||') for v in x]).apply(pd.Series.explode)
print( pd.concat([a['name'], exploded], axis=1) )
答案 1 :(得分:1)
zip,product和chain的组合可以实现行的拆分。由于这涉及字符串,更重要的是没有数字计算,因此与在Pandas中运行相比,在Python中您应该获得更快的速度
from itertools import product,chain
combine = chain.from_iterable
#pair item and itemval columns
merge = zip(df.item,df.itemVal)
#pair the entires from the splits of item and itemval
merge = [zip(first.split("||"),last.split("||")) for first, last in merge]
#create a cartesian product with the name column
merger = [product([ent],cont) for ent, cont in zip(df.name,merge)]
#create ur exploded values
res = [(ent,*cont) for ent, cont in combine(merger)]
pd.DataFrame(res,columns=['name','item','itemVal'])
name item itemVal
0 John item1 item1Val
1 John item2 item2Val
2 John item3 item3Val
3 Tom item4 item4Val
答案 2 :(得分:0)
这可能不如Sammywemmy所提出的答案那么快,尽管如此,这是一个使用Pandas函数的通用函数。请注意,爆炸功能一次仅对一列起作用。所以:
df = pd.DataFrame({'A': [1, 2], 'B': [['a','b'], ['c','d']], 'C': [['z','y'], ['x','w']]})
A B C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]
##Logic for multi-col explode
list_cols = {'B','C'}
other_cols = list(set(df.columns) - set(list_cols))
exploded = [df[col].explode() for col in list_cols]
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)
A B C
------
1 a z
1 b y
2 c x
2 d w