优先按列计数合并所有列

时间:2017-08-28 18:59:19

标签: python pandas

尝试弄清楚如何将数据框的所有列合并到新列中,同时按最高计数#升级列。因此,在下面的示例中,GRR的数据将优先加载到GRD上的新列中。我的示例只有两列,但也需要迭代一个可变列号。

示例:

    print(df2[Matches].describe())
         GR2         GRD           GRR
count    200   9106.000000  18894.000000

DEPT      GRR     GRD         GR2        MERGED
0  400.0  NaN  45.007000   60            45.007000
1  400.5  35   42.575001   42.575001     35
2  401.0  NaN  43.755001   40            43.755001
3  401.5  40      Nan      45.417000     40
4  402.0  45      NaN         NaN        45

2 个答案:

答案 0 :(得分:2)

这会重新排列您的数据框,以便计数最高的列位于左侧,然后按照count值的降序排列。

df[df.describe().loc['count'].sort_values(ascending=False).index]

# Create sample data.
df = pd.DataFrame({c: range(5) for c in 'ABC'})
df.loc[:2, 'A'] = None
df.loc[0, 'B'] = None
>>> df
    A   B  C
0 NaN NaN  0
1 NaN   1  1
2 NaN   2  2
3   3   3  3
4   4   4  4

# Sort columns by count.
>>> df[df.describe().loc['count'].sort_values(ascending=False).index]
   C   B   A
0  0 NaN NaN
1  1   1 NaN
2  2   2 NaN
3  3   3   3
4  4   4   4

答案 1 :(得分:1)

按计数排序,然后找到第一个非空值。

priority = df1.loc['count'].sort_values(ascending=False).index

df.assign(MERGED=df.lookup(
    df.index,
    df[priority].notnull().idxmax(1)
))

    DEPT   GRR        GRD        GR2     MERGED
0  400.0   NaN  45.007000  60.000000  45.007000
1  400.5  35.0  42.575001  42.575001  35.000000
2  401.0   NaN  43.755001  40.000000  43.755001
3  401.5  40.0        NaN  45.417000  40.000000
4  402.0  45.0        NaN        NaN  45.000000

其中

df

    DEPT   GRR        GRD        GR2
0  400.0   NaN  45.007000  60.000000
1  400.5  35.0  42.575001  42.575001
2  401.0   NaN  43.755001  40.000000
3  401.5  40.0        NaN  45.417000
4  402.0  45.0        NaN        NaN

df1

       GR2     GRD      GRR
count  200  9106.0  18894.0