我有一个Dataframe,其中有多个列,其中某些列相等(尾端具有相同的键,例如:column1 ='a / first',column2 ='b / first')。我想合并这两列。请帮我解决问题。
我的数据框看起来像
name g1/column1 g1/column2 g1/g2/column1 g2/column2
AAAA 10 20 nan nan
AAAA nan nan 30 40
我的结果将如下所示
name g1/column1 g1/column2
AAAA 10 20
AAAA 30 40
预先感谢
答案 0 :(得分:3)
您需要df.combine_first,
col1=['g1/column1', 'g1/column2']
col2=['g1/g2/column1', 'g2/column2']
df[col1]=df[col1].combine_first(pd.DataFrame(df[col2].values,columns=col1))
df=df.drop(col2,axis=1)
print(df)
# name g1/column1 g1/column2
#0 AAAA 10.0 20.0
#1 AAAA 30.0 40.0
答案 1 :(得分:2)
使用:
#create index by all columns with no merge
df = df.set_index('name')
#MultiIndex by split last /
df.columns = df.columns.str.rsplit('/', n=1, expand=True)
#aggregate first no NaN values per second level of MultiIndex
df = df.groupby(level=1, axis=1).first()
print (df)
column1 column2
name
AAAA 10.0 20.0
AAAA 30.0 40.0
答案 2 :(得分:0)
解决方案之一:
df = pd.DataFrame([[10, 20, np.nan, np.nan],
[np.nan, np.nan, 30, 40]],
columns=['g1/column1', 'g1/column2', 'g1/g2/column1', 'g2/column2'])
df
g1/column1 g1/column2 g1/g2/column1 g2/column2
0 10.0 20.0 NaN NaN
1 NaN NaN 30.0 40.0
df = df.fillna(0) # <- replacing all NaN with 0
ndf = pd.DataFrame()
unique_cols = ['column1', 'column2']
for i in range(len(unique_cols)):
val = df.columns[df.columns.str.contains(unique_cols[i])]
ndf[val[0]] = df.loc[:,val].sum().reset_index(drop=True)
ndf # <- You can add index if you need (AAAA, AAAA)
g1/column1 g1/column2
0 10.0 20.0
1 30.0 40.0
答案 3 :(得分:0)
import pandas as pd
import numpy as np
g1 = [20, np.nan, 30, np.nan]
g1_2 = [10, np.nan, 20, np.nan]
g2 = [np.nan, 30, np.nan, 40]
g2_2 = [np.nan, 10, np.nan, 30]
dataList = list(zip(g1, g1_2, g2, g2_2))
df = pd.DataFrame(data = dataList, columns=['g1/column1', 'g1/column2', 'g1/g2/column1', 'g2/column2'])
df.fillna(0, inplace=True)
df['g1Combined'] = df['g1/column1'] + df['g1/g2/column1']
df['g2Combined'] = df['g1/column2'] + df['g2/column2']
df.drop('g1/column1', axis=1, inplace=True)
df.drop('g1/column2', axis=1, inplace=True)
df.drop('g1/g2/column1', axis=1, inplace=True)
df.drop('g2/column2', axis=1, inplace=True)
df