我有pandas数据框,其数据类型为object,int64,float64
。我想获取int64 and float64
列的列名。我在熊猫中使用以下命令,但似乎不起作用
cat_num_prv_app = [num for num in list(df.columns) if isinstance(num, (np.int64,np.float64))]
以下是我的数据类型
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1670214 entries, 0 to 1670213
Data columns (total 37 columns):
ID 1670214 non-null int64
NAME 1670214 non-null object
ANNUITY 1297979 non-null float64
AMOUNT 1670214 non-null float64
CREDIT 1670213 non-null float64
我想将列名ID,ANNUITY,AMOUNT and CREDIT
存储在一个变量中,以后可以用它来对数据框进行子集化。
答案 0 :(得分:8)
使用select_dtypes
和np.number
来选择所有数字列:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4.5,5,4,5,5,4],
'C':[7.4,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb')})
print (df)
A B C D E
0 a 4.5 7.4 1 a
1 b 5.0 8.0 3 a
2 c 4.0 9.0 5 a
3 d 5.0 4.0 7 b
4 e 5.0 2.0 1 b
5 f 4.0 3.0 0 b
print (df.dtypes)
A object
B float64
C float64
D int64
E object
dtype: object
cols = df.select_dtypes([np.number]).columns
print (cols)
Index(['B', 'C', 'D'], dtype='object')
可以在此处指定float64
和int64
:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4.5,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb')})
df['D'] = df['D'].astype(np.int32)
print (df.dtypes)
A object
B float64
C int64
D int32
E object
dtype: object
cols = df.select_dtypes([np.int64,np.float64]).columns
print (cols)
Index(['B', 'C'], dtype='object')
答案 1 :(得分:0)
使用“ np.where”的替代解决方案
(虽然比批准的答案差)
df.iloc[:, (np.where((df.dtypes == np.int64) | (df.dtypes == np.float64)))[0]].columns
示例代码:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [1, 2, 3], "B": [1.0, 2.0, 3.0], "C": ["a", "b", "c"]})
print(df.iloc[:, (np.where((df.dtypes == np.int64) |
(df.dtypes == np.float64)))[0]].columns)
> Index(['A', 'B'], dtype='object')