如何在熊猫数据框中获取数字列名称

时间:2018-08-04 10:15:48

标签: python pandas

我有pandas数据框,其数据类型为object,int64,float64。我想获取int64 and float64列的列名。我在熊猫中使用以下命令,但似乎不起作用

cat_num_prv_app = [num for num in list(df.columns) if isinstance(num, (np.int64,np.float64))]

以下是我的数据类型

 df.info()
 <class 'pandas.core.frame.DataFrame'>
 RangeIndex: 1670214 entries, 0 to 1670213
 Data columns (total 37 columns):
 ID               1670214 non-null int64
 NAME             1670214 non-null object
 ANNUITY          1297979 non-null float64
 AMOUNT           1670214 non-null float64
 CREDIT           1670213 non-null float64

我想将列名ID,ANNUITY,AMOUNT and CREDIT存储在一个变量中,以后可以用它来对数据框进行子集化。

2 个答案:

答案 0 :(得分:8)

使用select_dtypesnp.number来选择所有数字列:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4.5,5,4,5,5,4],
                   'C':[7.4,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':list('aaabbb')})

print (df)
   A    B    C  D  E
0  a  4.5  7.4  1  a
1  b  5.0  8.0  3  a
2  c  4.0  9.0  5  a
3  d  5.0  4.0  7  b
4  e  5.0  2.0  1  b
5  f  4.0  3.0  0  b

print (df.dtypes)
A     object
B    float64
C    float64
D      int64
E     object
dtype: object

cols = df.select_dtypes([np.number]).columns
print (cols)
Index(['B', 'C', 'D'], dtype='object')

可以在此处指定float64int64

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4.5,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':list('aaabbb')})

df['D'] = df['D'].astype(np.int32)
print (df.dtypes)
A     object
B    float64
C      int64
D      int32
E     object
dtype: object

cols = df.select_dtypes([np.int64,np.float64]).columns
print (cols)
Index(['B', 'C'], dtype='object')

答案 1 :(得分:0)

使用“ np.where”的替代解决方案
(虽然比批准的答案差)

df.iloc[:, (np.where((df.dtypes == np.int64) | (df.dtypes == np.float64)))[0]].columns

示例代码:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [1, 2, 3], "B": [1.0, 2.0, 3.0], "C": ["a", "b", "c"]})

print(df.iloc[:, (np.where((df.dtypes == np.int64) | 
                 (df.dtypes == np.float64)))[0]].columns)

> Index(['A', 'B'], dtype='object')