在DataFrame熊猫中合并两列

时间:2018-12-06 05:01:13

标签: python pandas

我有一个Dataframe,其中有多个列,其中某些列相等(尾端具有相同的键,例如:column1 ='a / first',column2 ='b / first')。我想合并这两列。请帮我解决问题。

我的数据框看起来像

name   g1/column1  g1/column2 g1/g2/column1  g2/column2
AAAA   10             20          nan           nan
AAAA   nan            nan         30            40

我的结果将如下所示

name   g1/column1  g1/column2
AAAA   10             20          
AAAA   30             40      

预先感谢

4 个答案:

答案 0 :(得分:3)

您需要df.combine_first

col1=['g1/column1', 'g1/column2']
col2=['g1/g2/column1', 'g2/column2']

df[col1]=df[col1].combine_first(pd.DataFrame(df[col2].values,columns=col1))

df=df.drop(col2,axis=1)

print(df)
#   name  g1/column1    g1/column2
#0  AAAA  10.0      20.0
#1  AAAA  30.0      40.0

答案 1 :(得分:2)

使用:

#create index by all columns with no merge
df = df.set_index('name')
#MultiIndex by split last /
df.columns = df.columns.str.rsplit('/', n=1, expand=True)
#aggregate first no NaN values per second level of MultiIndex
df = df.groupby(level=1, axis=1).first()
print (df)
      column1  column2
name                  
AAAA     10.0     20.0
AAAA     30.0     40.0

答案 2 :(得分:0)

解决方案之一:

df = pd.DataFrame([[10, 20, np.nan, np.nan],
                  [np.nan, np.nan, 30, 40]],
                 columns=['g1/column1', 'g1/column2', 'g1/g2/column1', 'g2/column2'])
df

   g1/column1   g1/column2  g1/g2/column1   g2/column2
0   10.0        20.0        NaN             NaN
1   NaN         NaN         30.0            40.0

df = df.fillna(0)  # <- replacing all NaN with 0

ndf = pd.DataFrame() 

unique_cols = ['column1', 'column2']

for i in range(len(unique_cols)):
    val = df.columns[df.columns.str.contains(unique_cols[i])]
    ndf[val[0]] = df.loc[:,val].sum().reset_index(drop=True)

ndf  # <- You can add index if you need (AAAA, AAAA)

    g1/column1  g1/column2
0   10.0        20.0
1   30.0        40.0

答案 3 :(得分:0)

import pandas as pd
import numpy as np

g1 = [20, np.nan, 30, np.nan]
g1_2 = [10, np.nan, 20, np.nan]
g2 = [np.nan, 30, np.nan, 40]
g2_2 = [np.nan, 10, np.nan, 30]

dataList = list(zip(g1, g1_2, g2, g2_2))
df = pd.DataFrame(data = dataList, columns=['g1/column1', 'g1/column2', 'g1/g2/column1', 'g2/column2'])

df.fillna(0, inplace=True)

df['g1Combined'] = df['g1/column1'] + df['g1/g2/column1']
df['g2Combined'] = df['g1/column2'] + df['g2/column2']
df.drop('g1/column1', axis=1, inplace=True)
df.drop('g1/column2', axis=1, inplace=True)
df.drop('g1/g2/column1', axis=1, inplace=True)
df.drop('g2/column2', axis=1, inplace=True)
df