Question

我有一个带有ID列和一些功能列的DataFrame。我希望看到每列值有多少唯一ID的说明。

以下代码有效但我想知道是否有比to_frame().unstack().unstack()行更好的方法将.describe()系列结果转换为DataFrame，其中列是百分位数，最大值，最小值......

def unique_ids(df):
    rows = []
    for col in sorted(c for c in df.columns if c != id_col):
        v = df.groupby(col)[id_col].nunique().describe()
        v = v.to_frame().unstack().unstack()  # Transpose
        v.index = [col]
        rows.append(v)

    return pd.concat(rows)

Answer 1

看来你需要改变：

v = v.to_frame().unstack().unstack()

到

v = v.to_frame().T

或者可能transpose最后DataFrame，rename也添加了col：

df = pd.DataFrame({'ID':[1,1,3],
                   'E':[4,5,5],
                   'C':[7,8,9]})

print (df)
   C  E  ID
0  7  4   1
1  8  5   1
2  9  5   3

def unique_ids(df):
    rows = []
    id_col = 'ID'
    for col in sorted(c for c in df.columns if c != id_col):
        v = df.groupby(col)[id_col].nunique().describe().rename(col)
        rows.append(v)
    return pd.concat(rows, axis=1).T

print (unique_ids(df))
   count  mean       std  min   25%  50%   75%  max
C    3.0   1.0  0.000000  1.0  1.00  1.0  1.00  1.0
E    2.0   1.5  0.707107  1.0  1.25  1.5  1.75  2.0

＆＃34;转＆＃34;熊猫系列

1 个答案: