筛选大于零的数据帧列值?

时间:2017-10-13 11:12:43

标签: python pandas dataframe

我有一个csv文件,我正在读作pd.read_csv(文件),我试图只获取那些值大于零的行。

数据框中有一些空单元格和一些负值以及一些exp数字,如-1.72E +10。

Time              A      B       C       D       E       F         G
9/8/2017 8:40   1.29    0.27    1.78    0.23    0.33    0.05    -13.72
9/8/2017 9:00   1.28    0.26    1.78    0.22    0.35    0.02    -13.59
9/8/2017 9:20   1.43                         
9/8/2017 9:40   1.44    0.29    1.93    0.25    0.28    0.01    -13.92
9/8/2017 10:00  1.36    0.27    1.84    0.23    0.31    0.02    -13.77
9/8/2017 10:20  1.38    0.27    1.89    0.23    0.31    0.01    -13.83
9/8/2017 10:40      -1.72E+10   -1.72E+10   -1.72E+10   -1.72E+10   -1.72E+10   -1.72E+10
9/8/2017 11:00  1.4 0.28    1.88    0.24    0.28    0.02    -13.92
9/8/2017 11:20  1.43    0.28    1.92    0.24    0.29    0.02    -13.83

每当我运行代码时,它都不会过滤这些数据。

df = df[df > 0]

列的类型是str而不是numpy.float64

有人可以告诉我这个问题吗?

我想过滤其值大于0的整个数据帧行。

1 个答案:

答案 0 :(得分:2)

我认为您需要any才能检查至少一个True

df = df[(df > 0).any(axis=1)]

all检查是否所有True

df = df[(df > 0).all(axis=1)]
#last row and first numeric column was modify for no negative values
print (df)
             Time             A             B             C             D  \
0   9/8/2017 8:40  1.290000e+00  2.700000e-01  1.780000e+00  2.300000e-01   
1   9/8/2017 9:00  1.280000e+00  2.600000e-01  1.780000e+00  2.200000e-01   
2   9/8/2017 9:20  1.430000e+00           NaN           NaN           NaN   
3   9/8/2017 9:40  1.440000e+00  2.900000e-01  1.930000e+00  2.500000e-01   
4  9/8/2017 10:00  1.360000e+00  2.700000e-01  1.840000e+00  2.300000e-01   
5  9/8/2017 10:20  1.380000e+00  2.700000e-01  1.890000e+00  2.300000e-01   
6  9/8/2017 10:40  1.720000e+10 -1.720000e+10 -1.720000e+10 -1.720000e+10   
7  9/8/2017 11:00  1.400000e+00  2.800000e-01  1.880000e+00  2.400000e-01   
8  9/8/2017 11:20  1.430000e+00  2.800000e-01  1.920000e+00  2.400000e-01   

              E             F      G  
0  3.300000e-01  5.000000e-02 -13.72  
1  3.500000e-01  2.000000e-02 -13.59  
2           NaN           NaN    NaN  
3  2.800000e-01  1.000000e-02 -13.92  
4  3.100000e-01  2.000000e-02 -13.77  
5  3.100000e-01  1.000000e-02 -13.83  
6 -1.720000e+10 -1.720000e+10    NaN  
7  2.800000e-01  2.000000e-02 -13.92  
8  2.900000e-01  2.000000e-02  13.83  


df1 = df[(df > 0).all(axis=1)]
print (df1)
             Time     A     B     C     D     E     F      G
8  9/8/2017 11:20  1.43  0.28  1.92  0.24  0.29  0.02  13.83
df1 = df.loc[:, (df > 0).all()]
print (df1)
             Time             A
0   9/8/2017 8:40  1.290000e+00
1   9/8/2017 9:00  1.280000e+00
2   9/8/2017 9:20  1.430000e+00
3   9/8/2017 9:40  1.440000e+00
4  9/8/2017 10:00  1.360000e+00
5  9/8/2017 10:20  1.380000e+00
6  9/8/2017 10:40  1.720000e+10
7  9/8/2017 11:00  1.400000e+00
8  9/8/2017 11:20  1.430000e+00

EDIT1:

要转换为float所有没有Time的列:

cols = df.columns.difference(['Time'])
df[cols] = df[cols].astype(float)
print (df.dtypes)
Time     object
A       float64
B       float64
C       float64
D       float64
E       float64
F       float64
G       float64
dtype: object

df1 = df.loc[:, (df > 0).all()]
print (df1)
             Time             A
0   9/8/2017 8:40  1.290000e+00
1   9/8/2017 9:00  1.280000e+00
2   9/8/2017 9:20  1.430000e+00
3   9/8/2017 9:40  1.440000e+00
4  9/8/2017 10:00  1.360000e+00
5  9/8/2017 10:20  1.380000e+00
6  9/8/2017 10:40  1.720000e+10
7  9/8/2017 11:00  1.400000e+00
8  9/8/2017 11:20  1.430000e+00