为什么布尔索引会返回所有NaN

时间:2016-07-10 11:28:57

标签: python python-3.x csv pandas

我有这个CSV文件refs/

emp.csv

使用以下代码,我得到所有index empno ename job mgr hiredate sal comm deptno 0, 7839, KING, PRESIDENT, 0, 1981-11-17, 5000, 0, 10 1, 7698, BLAKE, MANAGER, 7839, 1981-05-01, 2850, 0, 30 2, 7782, CLARK, MANAGER, 7839, 1981-05-09, 2450, 0, 10 3, 7566, JONES, MANAGER, 7839, 1981-04-01, 2975, 0, 20 4, 7654, MARTIN, SALESMAN, 7698, 1981-09-10, 1250, 1400, 30 5, 7499, ALLEN, SALESMAN, 7698, 1981-02-11, 1600 300, 30 6, 7844, TURNER, SALESMAN, 7698, 1981-08-21, 1500, 0, 30 7, 7900, JAMES, CLERK, 7698, 1981-12-11, 950, 0, 30 8, 7521, WARD, SALESMAN, 7698, 1981-02-23, 1250, 500, 30 9, 7902, FORD, ANALYST, 7566, 1981-12-11, 3000, 0, 20 10, 7369, SMITH, CLERK, 7902, 1980-12-09, 800, 0, 20 11, 7788, SCOTT, ANALYST, 7566 1982-12-22, 3000, 0, 20 12, 7876, ADAMS, CLERK, 7788, 1983-01-15, 1100, 0, 20 13, 7934, MILLER, CLERK, 7782, 1982-01-11, 1300, 0, 10 ' s:

NaN

这是代码的输出:

import csv
import sys
import pandas as pd
import dateutil

# Load data from csv file
emp = pd.DataFrame.from_csv("D:\R data\emp.csv")
# Convert date from string to date times`enter code here`
emp['hiredate'] = emp['hiredate'].apply(dateutil.parser.parse, dayfirst=True)
jonessal = emp[['sal']][emp['ename']=='JONES']
empename = emp[['ename','sal']][emp['sal'] > jonessal] 
print(empename)

我想要的输出是:

index           
0       NaN  NaN
1       NaN  NaN
2       NaN  NaN
3       NaN  NaN
4       NaN  NaN
5       NaN  NaN
6       NaN  NaN
7       NaN  NaN
8       NaN  NaN
9       NaN  NaN
10      NaN  NaN
11      NaN  NaN
12      NaN  NaN
13      NaN  NaN

我认为变量index 0 KING 5000 9 FORD 3000 11 SCOTT 3000 的值是2975,但结果是jonesal

如果我使用NaN对薪水进行硬编码,它会正常工作,但当我使用变量时,它会返回所有NaN:empename = emp[['ename','sal']][emp['sal'] > 2975 ]

3 个答案:

答案 0 :(得分:3)

emp[['ename','sal']][emp['sal'] > jonessal] 是一个DataFrame。

emp['sal'] > jonessal

这里,比较jonessal = emp.loc[emp['ename']=='JONES', 'sal'].values[0] 不是标量,并且由于brodcasting它返回一个奇怪的DataFrame。由于索引/形状不匹配,因此最终结果由NaN组成。

在这里,您假设只有一名员工叫琼斯。遵循相同的假设,你可以得到标量:

.values

[0]返回一个数组,emp[['ename','sal']][emp['sal'] > jonessal] Out[81]: ename sal 0 KING 5000 9 FORD 3000 11 SCOTT 3000 来自单个员工的假设。)

现在,它将返回相同的结果:

Use a List of boolean type to hold state of check box.By default fill your collection with false value.

When you select a check box change the state of map using set method

As you know When you scroll there will be a call to your adapter there you read the value from map using get and set it to checkbox


List<Boolean> checkstate = new ArrayList<Boolean>();

Inside the adapter constructor
for(i=0;i<itemSize;i++)
{
checkstate.add(false);
}

Inside your Viewholder add below Line what it does is whatever the updated value of checkbox will set to your checkbox.Initially all the Items  will be false   
yourCheckbox.setChecked(checkstate.get(position));

Now inside onCheckedChanged Listener
if(boxchecked)
{
 checkstate.set(position, true);
}else
{
checkstate.set(position, false);
}

答案 1 :(得分:2)

我认为您需要read_csv boolean indexingix,它们只会过滤enamesal列:

import pandas as pd
import io

temp=u"""index   empno   ename   job mgr hiredate    sal comm    deptno
0,  7839,   KING,   PRESIDENT,  0,  1981-11-17,     5000,   0,  10
1,  7698,   BLAKE,  MANAGER,    7839,   1981-05-01, 2850,   0,  30
2,  7782,   CLARK,  MANAGER,    7839,   1981-05-09, 2450,   0,  10
3,  7566,   JONES,  MANAGER,    7839,   1981-04-01, 2975,   0,  20
4,  7654,   MARTIN, SALESMAN,   7698,   1981-09-10, 1250,   1400,   30
5,  7499,   ALLEN,  SALESMAN,   7698,   1981-02-11, 1600,    300,    30
6,  7844,   TURNER, SALESMAN,   7698,   1981-08-21, 1500,   0,  30
7,  7900,   JAMES,  CLERK,      7698,   1981-12-11, 950,    0,  30
8,  7521,   WARD,   SALESMAN,   7698,   1981-02-23, 1250,   500,    30
9,  7902,   FORD,   ANALYST,    7566,   1981-12-11, 3000,   0,  20
10, 7369,   SMITH,  CLERK,      7902,   1980-12-09, 800,    0,  20
11, 7788,   SCOTT,  ANALYST,    7566,    1982-12-22, 3000,   0,  20
12, 7876,   ADAMS,  CLERK,      7788,   1983-01-15, 1100,   0,  20
13, 7934,   MILLER, CLERK,      7782,   1982-01-11, 1300,   0,  10"""
#after testing replace io.StringIO(temp) to filename
emp = pd.read_csv(io.StringIO(temp), 
                 skipinitialspace=True,
                 skiprows=1, 
                 parse_dates=[5], 
                 names=['index','empno','ename', 'job','mgr','hiredate','sal','comm','deptno'])
print (emp)
    index  empno   ename        job   mgr   hiredate   sal  comm  deptno
0       0   7839    KING  PRESIDENT     0 1981-11-17  5000     0      10
1       1   7698   BLAKE    MANAGER  7839 1981-05-01  2850     0      30
2       2   7782   CLARK    MANAGER  7839 1981-05-09  2450     0      10
3       3   7566   JONES    MANAGER  7839 1981-04-01  2975     0      20
4       4   7654  MARTIN   SALESMAN  7698 1981-09-10  1250  1400      30
5       5   7499   ALLEN   SALESMAN  7698 1981-02-11  1600   300      30
6       6   7844  TURNER   SALESMAN  7698 1981-08-21  1500     0      30
7       7   7900   JAMES      CLERK  7698 1981-12-11   950     0      30
8       8   7521    WARD   SALESMAN  7698 1981-02-23  1250   500      30
9       9   7902    FORD    ANALYST  7566 1981-12-11  3000     0      20
10     10   7369   SMITH      CLERK  7902 1980-12-09   800     0      20
11     11   7788   SCOTT    ANALYST  7566 1982-12-22  3000     0      20
12     12   7876   ADAMS      CLERK  7788 1983-01-15  1100     0      20
13     13   7934  MILLER      CLERK  7782 1982-01-11  1300     0      10

jonessal = emp.ix[emp['ename'] == 'JONES', 'sal'].iat[0]
print (jonessal)
2975
empename = emp.ix[emp['sal'] > jonessal, ['ename','sal']]
print (empename)
    ename   sal
0    KING  5000
9    FORD  3000
11  SCOTT  3000

答案 2 :(得分:1)

您对DataFrame.from_csv的调用不正确。默认情况下,它会使用逗号&#34;,&#34;作为字段分隔符:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html

Parameters:
  sep : string, default ‘,’
Field delimiter

但是你的csv不是以逗号分隔,而是以制表符分隔。

尝试将sep='\t'参数添加到from_csv调用:pd.DataFrame.from_csv("D:\R data\emp.csv",sep='\t')