根据列值pandas选择所有行

时间:2016-05-12 13:19:50

标签: python-3.x select pandas dataframe

您好我需要根据列值选择所有行,将其存储在新变量中或创建新数据帧并将其保存到csv中,而不将标题只保存到信息中。

import pandas as pd
import numpy as np

print(df)
#      0      1  2   3
# 0  Gm#    one  0   0
# 1  922    one  1   2
# 2  933    two  2   4
# 3  952  three  3   6
# 4  Gm#    two  4   8
# 5  960    two  5  10
# 6  963    one  6  12
# 7  999  three  7  14

所以我想要一个基于第一列条件的新数据框。我只想抓取>= 900 & <=999范围内的行。所以期望的输出:

我想将它存储在没有索引的csv中。

  print (df2)
  922    one  1   2
  933    two  2   4
  952  three  3   6
  960    two  5  10
  963    one  6  12
  999  three  7  14

我试过这个:问题我得到了我无法弄清楚如何将一个孔列转换为整数..或者有一种更简单的方法来做到这一点,只需参考孔数据框而不是检查各种文章堆栈溢出和YouTube视频,但只是无法正确。我很乐意欣赏它的任何想法。

#df[x]= data[x][(data[x]['0'].astype(np.int64))] need to find a away to convert the column [0] into integer for it evaluate
#df2 = data[i]([(data['0'] >= 900) & (data['0'] <= 999)])

1 个答案:

答案 0 :(得分:1)

您可以按to_numeric按位置转换iloc第一列,然后添加条件(data['0'].notnull()),因为数字值不会转换为NaN。上次使用to_csv参数index=False删除indexheader=None删除标题:

import pandas as pd

data = pd.DataFrame(
{'1': {0: 'one', 1: 'one', 2: 'two', 3: 'three', 4: 'two', 5: 'two', 6: 'one', 7: 'three'}, 
'0': {0: 'Gm', 1: '922', 2: '933', 3: '952', 4: 'Gm', 5: '960', 6: '963', 7: '999'}, 
'3': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8, 5: 10, 6: 12, 7: 14}, 
'2': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7}})

print data

     0      1  2   3
0   Gm    one  0   0
1  922    one  1   2
2  933    two  2   4
3  952  three  3   6
4   Gm    two  4   8
5  960    two  5  10
6  963    one  6  12
7  999  three  7  14
data.iloc[:, 0] = pd.to_numeric(data.iloc[:, 0], errors='coerce')
print data
       0      1  2   3
0    NaN    one  0   0
1  922.0    one  1   2
2  933.0    two  2   4
3  952.0  three  3   6
4    NaN    two  4   8
5  960.0    two  5  10
6  963.0    one  6  12
7  999.0  three  7  14


df1 = data[(data['0'] >= 900) & (data['0'] <= 999) & (data['0'].notnull())]
print df1
       0      1  2   3
1  922.0    one  1   2
2  933.0    two  2   4
3  952.0  three  3   6
5  960.0    two  5  10
6  963.0    one  6  12
7  999.0  three  7  14


df1.to_csv('file', index=False, header=None)

通过评论编辑:

您可以尝试:

for i in range(0, len(tables)): 
    df = tables[i]
    df.replace(regex=True,inplace=True,to_replace='½',value='.5') 
    df.iloc[:, 0] = pd.to_numeric(df.iloc[:, 0], errors='coerce') 
    df1 = df[(df.iloc[:, 0] >= 900) & (df['0'] <= 999) & (df['0'].notnull())]
    print (df1)