I have the following table (sample to ease whoever wants to try):
serial;spectra;name;UKST;ra;dec;ra2000;dec2000;BJG;BJSEL;BJG_OLD;BJSELOLD;GALEXT;SB_BJ;SR_R;z;z_helio;obsrun;quality;abemma;Z_ABS;KBESTR;R_CRCOR;Z_EMI;NMBEST;SNR;ETA_TYPE 1;2;TGS436Z001;349;00:11:55.72;-32:32:55.2;00:14:27.05;-32:16:14.6;19.424;19.362;19.430;19.390;0.062;19.368;18.286;0.2981;0.2981;01SEP;4;1;0.2981;5;4.5700;0.2984;1;3.8;-99.90000 2;1;TGS496Z001;349;00:11:59.29;-33:14:41.3;00:14:30.55;-32:58:00.7;18.842;18.789;18.870;18.840;0.053;18.688;17.291;0.1229;0.1228;01OCT;5;1;0.1229;1;14.3800;-9.9990;0;47.6;-2.58920 3;1;TGS435Z001;349;00:11:49.37;-32:39:57.4;00:14:20.71;-32:23:16.8;18.320;18.265;18.350;18.310;0.055;18.336;17.138;0.1038;0.1038;01SEP;4;1;0.1038;1;9.3800;0.1032;1;28.4;-2.46500
sidenote:
You should have a pandas dataframe from the above '''data-sample''' as follows:
>>> import StringIO
>>> _tmp = StringIO.StringIO()
>>> _tmp.write('''data-sample''')
>>> _tmp.seek(0)
>>> import pandas
>>> df = pandas.read_csv(_tmp,delimiter=';')
The correspoding df
we get has the following dtypes
information:
>>> df.dtypes
serial int64
spectra int64
name object
UKST int64
ra object
dec object
ra2000 object
dec2000 object
BJG float64
BJSEL float64
BJG_OLD float64
BJSELOLD float64
GALEXT float64
SB_BJ float64
SR_R float64
z float64
z_helio float64
obsrun object
quality int64
abemma int64
Z_ABS float64
KBESTR int64
R_CRCOR float64
Z_EMI float64
NMBEST int64
SNR float64
ETA_TYPE float64
dtype: object
All I wanna do is simply filter the column names given their data types; in particular, I want to keep the numeric columns. So, all I thought I should do was to check whether their dtype
was a numpy.number
,
>>> filter(lambda c:df[c].dtypes == numpy.number,df.columns)
['BJG',
'BJSEL',
'BJG_OLD',
'BJSELOLD',
'GALEXT',
'SB_BJ',
'SR_R',
'z',
'z_helio',
'Z_ABS',
'R_CRCOR',
'Z_EMI',
'SNR',
'ETA_TYPE']
but as we can see all I get are the >float
columns, the >int
ones are left behind.
I do get the result I want by doing:
>>> filter(lambda c:df[c].dtypes == numpy.floating or df[c].dtypes == numpy.integer, df.columns)
['serial',
'spectra',
'UKST',
'BJG',
'BJSEL',
'BJG_OLD',
'BJSELOLD',
'GALEXT',
'SB_BJ',
'SR_R',
'z',
'z_helio',
'quality',
'abemma',
'Z_ABS',
'KBESTR',
'R_CRCOR',
'Z_EMI',
'NMBEST',
'SNR',
'ETA_TYPE']
(Obs: numpy.floating
or numpy.number
give same result at the line above.)
The question here is: isn't numpy.number
expected to "represent" any numerical type in numpy (int,float,complex,etc)?
After reading the corresponding classes hierarchy at numpy.core.numerictypes
help pages, the above presented behavior is unexpected to me...
Does anybody has a comment on that? Am I missing something?
Cheers.
答案 0 :(得分:0)
使用select_dtypes
并在列表对象中传递np.number
:
In [160]:
df.select_dtypes([np.number]).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 21 columns):
serial 3 non-null int64
spectra 3 non-null int64
UKST 3 non-null int64
BJG 3 non-null float64
BJSEL 3 non-null float64
BJG_OLD 3 non-null float64
BJSELOLD 3 non-null float64
GALEXT 3 non-null float64
SB_BJ 3 non-null float64
SR_R 3 non-null float64
z 3 non-null float64
z_helio 3 non-null float64
quality 3 non-null int64
abemma 3 non-null int64
Z_ABS 3 non-null float64
KBESTR 3 non-null int64
R_CRCOR 3 non-null float64
Z_EMI 3 non-null float64
NMBEST 3 non-null int64
SNR 3 non-null float64
ETA_TYPE 3 non-null float64
dtypes: float64(14), int64(7)
memory usage: 528.0 bytes