尝试弄清楚如何将数据框的所有列合并到新列中,同时按最高计数#升级列。因此,在下面的示例中,GRR的数据将优先加载到GRD上的新列中。我的示例只有两列,但也需要迭代一个可变列号。
示例:
print(df2[Matches].describe())
GR2 GRD GRR
count 200 9106.000000 18894.000000
DEPT GRR GRD GR2 MERGED
0 400.0 NaN 45.007000 60 45.007000
1 400.5 35 42.575001 42.575001 35
2 401.0 NaN 43.755001 40 43.755001
3 401.5 40 Nan 45.417000 40
4 402.0 45 NaN NaN 45
答案 0 :(得分:2)
这会重新排列您的数据框,以便计数最高的列位于左侧,然后按照count
值的降序排列。
df[df.describe().loc['count'].sort_values(ascending=False).index]
# Create sample data.
df = pd.DataFrame({c: range(5) for c in 'ABC'})
df.loc[:2, 'A'] = None
df.loc[0, 'B'] = None
>>> df
A B C
0 NaN NaN 0
1 NaN 1 1
2 NaN 2 2
3 3 3 3
4 4 4 4
# Sort columns by count.
>>> df[df.describe().loc['count'].sort_values(ascending=False).index]
C B A
0 0 NaN NaN
1 1 1 NaN
2 2 2 NaN
3 3 3 3
4 4 4 4
答案 1 :(得分:1)
按计数排序,然后找到第一个非空值。
priority = df1.loc['count'].sort_values(ascending=False).index
df.assign(MERGED=df.lookup(
df.index,
df[priority].notnull().idxmax(1)
))
DEPT GRR GRD GR2 MERGED
0 400.0 NaN 45.007000 60.000000 45.007000
1 400.5 35.0 42.575001 42.575001 35.000000
2 401.0 NaN 43.755001 40.000000 43.755001
3 401.5 40.0 NaN 45.417000 40.000000
4 402.0 45.0 NaN NaN 45.000000
其中
df
DEPT GRR GRD GR2
0 400.0 NaN 45.007000 60.000000
1 400.5 35.0 42.575001 42.575001
2 401.0 NaN 43.755001 40.000000
3 401.5 40.0 NaN 45.417000
4 402.0 45.0 NaN NaN
和
df1
GR2 GRD GRR
count 200 9106.0 18894.0