我试图删除包含'?'的行在一个单元格中,但我得到的数据与我没有做任何事情一样。这是data set的链接。以下是代码:
import pandas as pd
from IPython.display import display
adult = pd.read_csv('adult.data.csv')
adult = adult[adult.Workclass != '?']
display(adult)
答案 0 :(得分:2)
我认为你需要str.strip
删除空格:
adult = adult[adult.Workclass.str.strip() != '?']
使用您的数据进行测试(仅设置列名,因此测试列6
)
import pandas as pd
from IPython.display import display
adult = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', header=None)
adult = adult[adult[6].str.strip() != '?']
display(adult.head(30))
0 1 2 3 4 5 \
0 39 State-gov 77516 Bachelors 13 Never-married
1 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse
2 38 Private 215646 HS-grad 9 Divorced
3 53 Private 234721 11th 7 Married-civ-spouse
4 28 Private 338409 Bachelors 13 Married-civ-spouse
5 37 Private 284582 Masters 14 Married-civ-spouse
6 49 Private 160187 9th 5 Married-spouse-absent
7 52 Self-emp-not-inc 209642 HS-grad 9 Married-civ-spouse
8 31 Private 45781 Masters 14 Never-married
9 42 Private 159449 Bachelors 13 Married-civ-spouse
10 37 Private 280464 Some-college 10 Married-civ-spouse
11 30 State-gov 141297 Bachelors 13 Married-civ-spouse
12 23 Private 122272 Bachelors 13 Never-married
13 32 Private 205019 Assoc-acdm 12 Never-married
14 40 Private 121772 Assoc-voc 11 Married-civ-spouse
15 34 Private 245487 7th-8th 4 Married-civ-spouse
16 25 Self-emp-not-inc 176756 HS-grad 9 Never-married
17 32 Private 186824 HS-grad 9 Never-married
18 38 Private 28887 11th 7 Married-civ-spouse
19 43 Self-emp-not-inc 292175 Masters 14 Divorced
20 40 Private 193524 Doctorate 16 Married-civ-spouse
21 54 Private 302146 HS-grad 9 Separated
22 35 Federal-gov 76845 9th 5 Married-civ-spouse
23 43 Private 117037 11th 7 Married-civ-spouse
24 59 Private 109015 HS-grad 9 Divorced
25 56 Local-gov 216851 Bachelors 13 Married-civ-spouse
26 19 Private 168294 HS-grad 9 Never-married
28 39 Private 367260 HS-grad 9 Divorced
29 49 Private 193366 HS-grad 9 Married-civ-spouse
30 23 Local-gov 190709 Assoc-acdm 12 Never-married
6 7 8 9 10 \
0 Adm-clerical Not-in-family White Male 2174
1 Exec-managerial Husband White Male 0
2 Handlers-cleaners Not-in-family White Male 0
3 Handlers-cleaners Husband Black Male 0
4 Prof-specialty Wife Black Female 0
5 Exec-managerial Wife White Female 0
6 Other-service Not-in-family Black Female 0
7 Exec-managerial Husband White Male 0
8 Prof-specialty Not-in-family White Female 14084
9 Exec-managerial Husband White Male 5178
10 Exec-managerial Husband Black Male 0
11 Prof-specialty Husband Asian-Pac-Islander Male 0
12 Adm-clerical Own-child White Female 0
13 Sales Not-in-family Black Male 0
14 Craft-repair Husband Asian-Pac-Islander Male 0
15 Transport-moving Husband Amer-Indian-Eskimo Male 0
16 Farming-fishing Own-child White Male 0
17 Machine-op-inspct Unmarried White Male 0
18 Sales Husband White Male 0
19 Exec-managerial Unmarried White Female 0
20 Prof-specialty Husband White Male 0
21 Other-service Unmarried Black Female 0
22 Farming-fishing Husband Black Male 0
23 Transport-moving Husband White Male 0
24 Tech-support Unmarried White Female 0
25 Tech-support Husband White Male 0
26 Craft-repair Own-child White Male 0
28 Exec-managerial Not-in-family White Male 0
29 Craft-repair Husband White Male 0
30 Protective-serv Not-in-family White Male 0
11 12 13 14
0 0 40 United-States <=50K
1 0 13 United-States <=50K
2 0 40 United-States <=50K
3 0 40 United-States <=50K
4 0 40 Cuba <=50K
5 0 40 United-States <=50K
6 0 16 Jamaica <=50K
7 0 45 United-States >50K
8 0 50 United-States >50K
9 0 40 United-States >50K
10 0 80 United-States >50K
11 0 40 India >50K
12 0 30 United-States <=50K
13 0 50 United-States <=50K
14 0 40 ? >50K
15 0 45 Mexico <=50K
16 0 35 United-States <=50K
17 0 40 United-States <=50K
18 0 50 United-States <=50K
19 0 45 United-States >50K
20 0 60 United-States >50K
21 0 20 United-States <=50K
22 0 40 United-States <=50K
23 2042 40 United-States <=50K
24 0 40 United-States <=50K
25 0 40 United-States >50K
26 0 40 United-States <=50K
28 0 80 United-States <=50K
29 0 40 United-States <=50K
30 0 52 United-States <=50K
通过评论编辑:
如果需要至少在一列中的所有行都是值?
:
#select object columns (obviously string columns)
df = adult.select_dtypes(['object'])
#remove whitespaces and compare, check at least one True
mask = (df.apply(lambda x: x.str.strip()) == '?').any(axis=1)
#print(mask)
#boolean indexing with inverting mask by ~
print (adult[~mask])