一步：

Question

是否有更快的方法来删除仅包含一个不同于下面代码的值的列？

cols=df.columns.tolist()
for col in cols:
    if len(set(df[col].tolist()))<2:
        df=df.drop(col, axis=1)

对于大型数据帧来说，这实在太慢了。从逻辑上讲，它计算每列中的值的数量，实际上它可以在达到2个不同的值后停止计数。

Answer 1

您可以使用Series.unique()方法查找列中的所有唯一元素，对于.unique()仅返回1元素的列，您可以删除它。示例 -

for col in df.columns:
    if len(df[col].unique()) == 1:
        df.drop(col,inplace=True,axis=1)

一种不进行原地丢弃的方法 -

res = df
for col in df.columns:
    if len(df[col].unique()) == 1:
        res = res.drop(col,axis=1)

演示 -

In [154]: df = pd.DataFrame([[1,2,3],[1,3,3],[1,2,3]])

In [155]: for col in df.columns:
   .....:     if len(df[col].unique()) == 1:
   .....:         df.drop(col,inplace=True,axis=1)
   .....:

In [156]: df
Out[156]:
   1
0  2
1  3
2  2

计时结果 -

In [166]: %paste
def func1(df):
        res = df
        for col in df.columns:
                if len(df[col].unique()) == 1:
                        res = res.drop(col,axis=1)
        return res

## -- End pasted text --

In [172]: df = pd.DataFrame({'a':1, 'b':np.arange(5), 'c':[0,0,2,2,2]})

In [178]: %timeit func1(df)
1000 loops, best of 3: 1.05 ms per loop

In [180]: %timeit df[df.apply(pd.Series.value_counts).dropna(thresh=2, axis=1).columns]
100 loops, best of 3: 8.81 ms per loop

In [181]: %timeit df.apply(pd.Series.value_counts).dropna(thresh=2, axis=1)
100 loops, best of 3: 5.81 ms per loop

最快的方法似乎仍然是使用unique并循环遍历列的方法。

Answer 2

一步：

df = df[[c for c
        in list(df)
        if len(df[c].unique()) > 1]]

两个步骤：

创建具有多于1个不同值的列名列表。

keep = [c for c
        in list(df)
        if len(df[c].unique()) > 1]

删除不在“保持”的列

df = df[keep]

Answer 3

您可以通过拨打apply并致电value_counts来创建df的面具，这将为除1之外的所有行生成NaN，然后您可以调用dropna列-wise并传递参数thresh=2，以便必须有2个或更多非NaN值：

In [329]:   
df = pd.DataFrame({'a':1, 'b':np.arange(5), 'c':[0,0,2,2,2]})
df

Out[329]:
   a  b  c
0  1  0  0
1  1  1  0
2  1  2  2
3  1  3  2
4  1  4  2

In [342]:
df[df.apply(pd.Series.value_counts).dropna(thresh=2, axis=1).columns]

Out[342]:
   b  c
0  0  0
1  1  0
2  2  2
3  3  2
4  4  2

布尔条件的输出：

In [344]:
df.apply(pd.Series.value_counts)

Out[344]:
    a  b   c
0 NaN  1   2
1   5  1 NaN
2 NaN  1   3
3 NaN  1 NaN
4 NaN  1 NaN

In [345]:
df.apply(pd.Series.value_counts).dropna(thresh=2, axis=1)

Out[345]:
   b   c
0  1   2
1  1 NaN
2  1   3
3  1 NaN
4  1 NaN

Answer 4

两个简单的单行代码，用于返回视图（jz0410的答案的较短版本）

df.loc[:,df.nunique()!=1]

或放到原位（通过drop()）

df.drop(columns=df.columns[df.nunique()==1], inplace=True)

Answer 5

另一种单线（从jz0410的答案中得到启发）：

accuracy

或就地（通过df.loc[:,df.nunique()!=1]）：

drop()

Answer 6

在我的用例中，没有一个解决方案起作用，因为出现了以下错误：（我的数据框包含列表项）。

TypeError：不可散列的类型：“列表”

对我有用的解决方案是：

ndf = df.describe(include="all").T
new_cols = set(df.columns) - set(ndf[ndf.unique == 1].index)
df = df[list(new_cols)]

Answer 7

我能找到的大多数“ pythonic”方式：

df = df.loc[:, (df != df.iloc[0]).any()]

Answer 8

线程和this thread中的许多示例不适用于我的df。这些工作：

# from: https://stackoverflow.com/questions/33144813/quickly-drop-dataframe-columns-with-only-one-distinct-value
# from: https://stackoverflow.com/questions/20209600/pandas-dataframe-remove-constant-column

import pandas as pd
import numpy as np


data = {'var1': [1,2,3,4,5,np.nan,7,8,9],
       'var2':['Order',np.nan,'Inv','Order','Order','Shp','Order', 'Order','Inv'],
       'var3':[101,101,101,102,102,102,103,103,np.nan], 
       'var4':[np.nan,1,1,1,1,1,1,1,1],
       'var5':[1,1,1,1,1,1,1,1,1],
       'var6':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
       'var7':["a","a","a","a","a","a","a","a","a"],
       'var8': [1,2,3,4,5,6,7,8,9]}


df = pd.DataFrame(data)
df_original = df.copy()



#-------------------------------------------------------------------------------------------------


df2 = df[[c for c
        in list(df)
        if len(df[c].unique()) > 1]]


#-------------------------------------------------------------------------------------------------


keep = [c for c
        in list(df)
        if len(df[c].unique()) > 1]

df3 = df[keep]



#-------------------------------------------------------------------------------------------------



keep_columns = [col for col in df.columns if len(df[col].unique()) > 1]

df5 = df[keep_columns].copy()



#-------------------------------------------------------------------------------------------------



for col in df.columns:
     if len(df[col].unique()) == 1:
         df.drop(col,inplace=True,axis=1)

Answer 9

我想抛出：熊猫1.0.3

ids = df.nunique().values>1
df.loc[:,ids]

不是那么慢：

2.81 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 10

一行

Intent sendIntent = new Intent();
sendIntent.setAction(Intent.ACTION_SEND);
sendIntent.putExtra(Intent.EXTRA_TEXT, "Here's a new lesson for" +
        " learning more Miwok vocabulary:\n\n" + dynamicLink);
sendIntent.putExtra(Intent.EXTRA_SUBJECT, "Let's Learn Miwok!");
sendIntent.setType("text/plain");
startActivity(Intent.createChooser(sendIntent,
        getResources().getText(R.string.send_to)));

Answer 11

df=df.loc[:,df.nunique()!=Numberofvalues]

Answer 12

带有 pipe 的解决方案之一（经常使用很方便）：

def drop_unique_value_col(df):
    return df.loc[:,df.apply(pd.Series.nunique) != 1]

df.pipe(drop_unique_value_col)

Answer 13

这将删除所有只有一个不同值的列。

for col in Dataframe.columns:
    
    if len(Dataframe[col].value_counts()) == 1:

        Dataframe.drop([col], axis=1, inplace=True)

只使用一个不同的值快速删除数据框列

13 个答案:

一步：

两个步骤：