熊猫的条件

时间:2014-01-31 02:08:31

标签: pandas

我在熊猫中有一个非常特殊的问题:一个条件有效但另一条条件无效。您可以下载链接文件来测试我的代码。谢谢!

我有一个文件(stars.txt),我用Pandas读过。我想创建两个组:(1)Log_g< 4.0和(2)Log_g> 4.0。在我的代码中(见下面)我可以成功获取组(1)的行:

    Kepler_ID            RA           Dec   Teff  Log_G       g       H
3     2305372  19 27 57.679  +37 40 21.90   5664  3.974  14.341  12.201
14    2708156  19 21 08.906  +37 56 11.44  11061  3.717  10.672  10.525
19    2997455  19 32 31.296  +38 07 40.04   4795  3.167  14.694  11.500
34    3352751  19 36 17.249  +38 25 36.91   7909  3.791  13.541  12.304
36    3440230  19 21 53.100  +38 31 42.82   7869  3.657  13.706  12.486

但由于某种原因,我无法得到(2)。该代码返回以下错误:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 90 entries, 0 to 108
Data columns (total 7 columns):
Kepler_ID    90  non-null values
RA           90  non-null values
Dec          90  non-null values
Teff         90  non-null values
Log_G        90  non-null values
g            90  non-null values
H            90  non-null values
dtypes: float64(4), int64(1), object(2)

这是我的代码:

#------------------------------------------
# IMPORT STATEMENTS 
#------------------------------------------
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#------------------------------------------
# READ FILE AND ASSOCIATE COMPONENTS 
#------------------------------------------
star_file = 'stars.txt'
header_row = ['Kepler_ID', 'RA','Dec','Teff', 'Log_G', 'g', 'H']
df = pd.read_csv(star_file, names=header_row, skiprows=2)
#------------------------------------------
# ASSOCIATE VARIABLES 
#------------------------------------------
Kepler_ID  = df['Kepler_ID']
#RA         = df['RA']         
#Dec        = df['Dec']
Teff       = df['Teff']
Log_G      = df['Log_G']
g          = df['g']
H          = df['H']
#------------------------------------------
# SUBSTITUTE MISSING DATA WITH NAN 
#------------------------------------------ 
df = df.replace('', np.nan)
#------------------------------------------
# CHANGE DATA TYPE OF THE REST OF DATA TO FLOAT 
#------------------------------------------ 
df[['Teff', 'Log_G', 'g', 'H']] = df[['Teff', 'Log_G', 'g', 'H']].astype(float)
#------------------------------------------
# SORTING SPECTRA TYPES FOR GIANTS  
#------------------------------------------
# FIND GIANTS IN THE SAMPLE 
giants = df[(df['Log_G'] < 4.)]
#print giants
# FIND GIANTS IN THE SAMPLE 
dwarfs = df[(df['Log_G'] > 4.)]
print dwarfs

1 个答案:

答案 0 :(得分:2)

这不是错误。您将看到DataFrame的摘要视图:

In [11]: df = pd.DataFrame([[2, 1], [3, 4]])

In [12]: df
Out[12]: 
   0  1
0  2  1
1  3  4

In [13]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
0    2  non-null values
1    2  non-null values
dtypes: int64(2)

显示的内容由多个显示package options决定,例如max_rows

In [14]: pd.options.display.max_rows
Out[14]: 60

In [15]: pd.options.display.max_rows = 120

在0.13中,这是behaviour changed, so you will see the first max_rows followed by ...