熊猫的下落行不起作用

时间:2016-12-18 15:08:38

标签: python pandas

我试图删除包含'?'的行在一个单元格中,但我得到的数据与我没有做任何事情一样。这是data set的链接。以下是代码:

import pandas as pd
from IPython.display import display

adult = pd.read_csv('adult.data.csv')
adult = adult[adult.Workclass != '?']
display(adult)

1 个答案:

答案 0 :(得分:2)

我认为你需要str.strip删除空格:

adult = adult[adult.Workclass.str.strip() != '?']

使用您的数据进行测试(仅设置列名,因此测试列6

import pandas as pd
from IPython.display import display

adult = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', header=None)
adult = adult[adult[6].str.strip() != '?']
display(adult.head(30))

    0                  1       2              3   4                       5   \
0   39          State-gov   77516      Bachelors  13           Never-married   
1   50   Self-emp-not-inc   83311      Bachelors  13      Married-civ-spouse   
2   38            Private  215646        HS-grad   9                Divorced   
3   53            Private  234721           11th   7      Married-civ-spouse   
4   28            Private  338409      Bachelors  13      Married-civ-spouse   
5   37            Private  284582        Masters  14      Married-civ-spouse   
6   49            Private  160187            9th   5   Married-spouse-absent   
7   52   Self-emp-not-inc  209642        HS-grad   9      Married-civ-spouse   
8   31            Private   45781        Masters  14           Never-married   
9   42            Private  159449      Bachelors  13      Married-civ-spouse   
10  37            Private  280464   Some-college  10      Married-civ-spouse   
11  30          State-gov  141297      Bachelors  13      Married-civ-spouse   
12  23            Private  122272      Bachelors  13           Never-married   
13  32            Private  205019     Assoc-acdm  12           Never-married   
14  40            Private  121772      Assoc-voc  11      Married-civ-spouse   
15  34            Private  245487        7th-8th   4      Married-civ-spouse   
16  25   Self-emp-not-inc  176756        HS-grad   9           Never-married   
17  32            Private  186824        HS-grad   9           Never-married   
18  38            Private   28887           11th   7      Married-civ-spouse   
19  43   Self-emp-not-inc  292175        Masters  14                Divorced   
20  40            Private  193524      Doctorate  16      Married-civ-spouse   
21  54            Private  302146        HS-grad   9               Separated   
22  35        Federal-gov   76845            9th   5      Married-civ-spouse   
23  43            Private  117037           11th   7      Married-civ-spouse   
24  59            Private  109015        HS-grad   9                Divorced   
25  56          Local-gov  216851      Bachelors  13      Married-civ-spouse   
26  19            Private  168294        HS-grad   9           Never-married   
28  39            Private  367260        HS-grad   9                Divorced   
29  49            Private  193366        HS-grad   9      Married-civ-spouse   
30  23          Local-gov  190709     Assoc-acdm  12           Never-married   

                    6               7                    8        9      10  \
0         Adm-clerical   Not-in-family                White     Male   2174   
1      Exec-managerial         Husband                White     Male      0   
2    Handlers-cleaners   Not-in-family                White     Male      0   
3    Handlers-cleaners         Husband                Black     Male      0   
4       Prof-specialty            Wife                Black   Female      0   
5      Exec-managerial            Wife                White   Female      0   
6        Other-service   Not-in-family                Black   Female      0   
7      Exec-managerial         Husband                White     Male      0   
8       Prof-specialty   Not-in-family                White   Female  14084   
9      Exec-managerial         Husband                White     Male   5178   
10     Exec-managerial         Husband                Black     Male      0   
11      Prof-specialty         Husband   Asian-Pac-Islander     Male      0   
12        Adm-clerical       Own-child                White   Female      0   
13               Sales   Not-in-family                Black     Male      0   
14        Craft-repair         Husband   Asian-Pac-Islander     Male      0   
15    Transport-moving         Husband   Amer-Indian-Eskimo     Male      0   
16     Farming-fishing       Own-child                White     Male      0   
17   Machine-op-inspct       Unmarried                White     Male      0   
18               Sales         Husband                White     Male      0   
19     Exec-managerial       Unmarried                White   Female      0   
20      Prof-specialty         Husband                White     Male      0   
21       Other-service       Unmarried                Black   Female      0   
22     Farming-fishing         Husband                Black     Male      0   
23    Transport-moving         Husband                White     Male      0   
24        Tech-support       Unmarried                White   Female      0   
25        Tech-support         Husband                White     Male      0   
26        Craft-repair       Own-child                White     Male      0   
28     Exec-managerial   Not-in-family                White     Male      0   
29        Craft-repair         Husband                White     Male      0   
30     Protective-serv   Not-in-family                White     Male      0   

      11  12              13      14  
0      0  40   United-States   <=50K  
1      0  13   United-States   <=50K  
2      0  40   United-States   <=50K  
3      0  40   United-States   <=50K  
4      0  40            Cuba   <=50K  
5      0  40   United-States   <=50K  
6      0  16         Jamaica   <=50K  
7      0  45   United-States    >50K  
8      0  50   United-States    >50K  
9      0  40   United-States    >50K  
10     0  80   United-States    >50K  
11     0  40           India    >50K  
12     0  30   United-States   <=50K  
13     0  50   United-States   <=50K  
14     0  40               ?    >50K  
15     0  45          Mexico   <=50K  
16     0  35   United-States   <=50K  
17     0  40   United-States   <=50K  
18     0  50   United-States   <=50K  
19     0  45   United-States    >50K  
20     0  60   United-States    >50K  
21     0  20   United-States   <=50K  
22     0  40   United-States   <=50K  
23  2042  40   United-States   <=50K  
24     0  40   United-States   <=50K  
25     0  40   United-States    >50K  
26     0  40   United-States   <=50K  
28     0  80   United-States   <=50K  
29     0  40   United-States   <=50K  
30     0  52   United-States   <=50K  

通过评论编辑:

如果需要至少在一列中的所有行都是值?

#select object columns (obviously string columns)
df = adult.select_dtypes(['object'])
#remove whitespaces and compare, check at least one True
mask = (df.apply(lambda x: x.str.strip()) == '?').any(axis=1)
#print(mask)
#boolean indexing with inverting mask by ~
print (adult[~mask])