Python熊猫:爆炸多行

时间:2020-05-16 21:47:24

标签: python python-3.x pandas

我必须在数据框下方:

import pandas as pd

a = pd.DataFrame([{"name": "John", 
                   "item" : "item1||item2||item3", 
                   "itemVal" : "item1Val||item2Val||item3Val"}, 
                  {"name" : "Tom", 
                   "item":"item4", 
                   "itemVal" : "item4Val"
                  }
                 ])

数据框是这样的:

   name                 item                       itemVal
   John  item1||item2||item3  item1Val||item2Val||item3Val
    Tom                item4                      item4Val

我想将该行爆炸成多行,以使它像这样(注意item及其itemVal必须匹配)。

   name                 item                       itemVal
   John                item1                      item1Val
   John                item2                      item2Val
   John                item3                      item3Val
    Tom                item4                      item4Val

我在这里还查看了其他答案:

Split (explode) pandas dataframe string entry to separate rows

pandas: How do I split text in a column into multiple rows?

但是作品只能在一栏上发表。如何使它在多列上工作?我正在使用Pandas 1.0.1和Python 3.8

3 个答案:

答案 0 :(得分:3)

a = a.apply(lambda x: [v.split('||') for v in x]).apply(pd.Series.explode)
print(a)

打印:

   name   item   itemVal
0  John  item1  item1Val
0  John  item2  item2Val
0  John  item3  item3Val
1   Tom  item4  item4Val

编辑:如果只想拆分选定的列,则可以执行以下操作:

exploded = a[['item', 'itemVal']].apply(lambda x: [v.split('||') for v in x]).apply(pd.Series.explode)
print( pd.concat([a['name'], exploded], axis=1) )

答案 1 :(得分:1)

zipproductchain的组合可以实现行的拆分。由于这涉及字符串,更重要的是没有数字计算,因此与在Pandas中运行相比,在Python中您应该获得更快的速度

from itertools import product,chain
combine = chain.from_iterable

#pair item and itemval columns
merge = zip(df.item,df.itemVal) 

#pair the entires from the splits of item and itemval
merge = [zip(first.split("||"),last.split("||")) for first, last in merge]

#create a cartesian product with the name column
merger = [product([ent],cont) for ent, cont in zip(df.name,merge)]

#create ur exploded values
res = [(ent,*cont) for ent, cont in combine(merger)]
pd.DataFrame(res,columns=['name','item','itemVal'])

    name    item    itemVal
0   John    item1   item1Val
1   John    item2   item2Val
2   John    item3   item3Val
3   Tom     item4   item4Val

答案 2 :(得分:0)

这可能不如Sammywemmy所提出的答案那么快,尽管如此,这是一个使用Pandas函数的通用函数。请注意,爆炸功能一次仅对一列起作用。所以:

df = pd.DataFrame({'A': [1, 2], 'B': [['a','b'], ['c','d']], 'C': [['z','y'], ['x','w']]})

A    B     C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]

##Logic for multi-col explode
list_cols = {'B','C'}
other_cols = list(set(df.columns) - set(list_cols))
exploded = [df[col].explode() for col in list_cols]
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)

A B C
------
1 a z
1 b y
2 c x
2 d w