是否有一种有效的方法来删除至少有20%缺失值的列?
假设我的数据框如下:
A B C D
0 sg hh 1 7
1 gf 9
2 hh 10
3 dd 8
4 6
5 y 8`
删除列后,数据框将如下所示:
A D
0 sg 7
1 gf 9
2 hh 10
3 dd 8
4 6
5 y 8`
答案 0 :(得分:10)
您可以boolean indexing
使用columns
notnull
,80%
的数量大于df.loc[:, pd.notnull(df).sum()>len(df)*.8]
:
1
这对许多情况很有用,例如,删除大于df.loc[:, (df > 1).sum() > len(df) *. 8]
的值的列数:
.dropna()
或者,对于thresh
案例,您还可以指定.dropna()
df.dropna(thresh=0.8*len(df), axis=1)
关键字,如@EdChum所示:
df = pd.DataFrame(np.random.random((100, 5)), columns=list('ABCDE'))
for col in df:
df.loc[np.random.choice(list(range(100)), np.random.randint(10, 30)), col] = np.nan
%timeit df.loc[:, pd.notnull(df).sum()>len(df)*.8]
1000 loops, best of 3: 716 µs per loop
%timeit df.dropna(thresh=0.8*len(df), axis=1)
1000 loops, best of 3: 537 µs per loop
后者会稍快一些:
let textChecker = UITextChecker()
let getAvailableLanguages = UITextChecker.availableLanguages()
print(getAvailableLanguages)
let partial = "leo"
let completions = textChecker.completionsForPartialWordRange(NSRange(0..<partial.utf16.count), inString: partial,language: "en_US")
let completions2 = textChecker.guessesForWordRange(NSRange(0..<partial.utf16.count), inString: partial, language: "en_US")
print(completions)
print(completions2)
答案 1 :(得分:3)
您可以致电dropna
并传递thresh
值,以删除不符合您的阈值条件的列:
In [10]:
frac = len(df) * 0.8
df.dropna(thresh=frac, axis=1)
Out[10]:
A D
0 sg 7
1 gf 9
2 hh 10
3 dd 8
4 NaN 6
5 y 8