如何将类似的列名称组合到Pandas中的单独行中

时间:2019-11-25 00:02:45

标签: python pandas

如果我拥有以下数据,并且将其读入,对于相似的列,我将获得.1或.2的列名。数据如下:

import io
dfff=io.StringIO("""address,phone,name,website,type,address,phone,name,website,type,address,phone,name,type
123 APPLE STREET,555-5555,APPLE STORE,APPLE.COM,BUSINESS,456 peach ave,777-7777,PEACH STORE,PEACH.COM,BUSINESS,789 banana rd,999-9999,banana store,BUSINESS""")

dfff=io.StringIO("""address,phone,name,website,type,address,phone,name,website,type,address,phone,name,type 
123 APPLE STREET,555-5555,APPLE STORE,APPLE.COM,BUSINESS,456 peach ave,777-7777,PEACH STORE,PEACH.COM,BUSINESS,789 banana rd,999-9999,banana store,BUSINESS""") 
dfff.seek(0)
newdf2=pd.read_csv(dfff)

这是输出,pandas将列重命名为具有相似列名的.1或.2。

newdf2
#            address     phone         name    website      type      address.1   phone.1       name.1  website.1    type.1      address.2   phone.2        name.2    type.2
#0  123 APPLE STREET  555-5555  APPLE STORE  APPLE.COM  BUSINESS  456 peach ave  777-7777  PEACH STORE  PEACH.COM  BUSINESS  789 banana rd  999-9999  banana store  BUSINESS

如何将类似地址行合并到单独的行中,以获取此输出(由于没有website.2,它将为NaN或0或空白):

#            address     phone         name    website      type      
#0  123 APPLE STREET  555-5555  APPLE STORE  APPLE.COM  BUSINESS
#1     456 peach ave  777-7777  PEACH STORE  PEACH.COM  BUSINESS
#2     789 banana rd  999-9999  banana store       NaN  BUSINESS

现在,我真的没有从哪里开始,但是我尝试堆叠数据,该数据可以按预期工作,但是拆栈只会恢复到原始数据:

newdf2.stack().to_frame()
#                            0
#0 address    123 APPLE STREET
#  phone              555-5555
#  name            APPLE STORE
#  website           APPLE.COM
#  type               BUSINESS
#  address.1     456 peach ave
#  phone.1            777-7777
#  name.1          PEACH STORE
#  website.1         PEACH.COM
#  type.1             BUSINESS
#  address.2     789 banana rd
#  phone.2            999-9999
#  name.2         banana store
#  type.2             BUSINESS

我在想必须有一种方法可以堆叠,从列中删除。,然后堆叠为我想要的格式?也许还有另一种方法?

1 个答案:

答案 0 :(得分:1)

您可以使用wide_to_long。

df.columns = [f'{x}.0' if '.' not in x else x for x in df.columns]
df['id'] = df.index

df = pd.wide_to_long(df, stubnames=['address', 'phone', 'name', 'website', 'type'], i='id', j='row', sep='.')

df.reset_index(drop=True)

Out[1]: 
            address     phone          name    website      type
0  123 APPLE STREET  555-5555   APPLE STORE  APPLE.COM  BUSINESS
1     456 peach ave  777-7777   PEACH STORE  PEACH.COM  BUSINESS
2     789 banana rd  999-9999  banana store        NaN  BUSINESS