尝试使用for循环来填充年龄值,如下所示
for dataset in train:
dataset.loc[(dataset['age'] > 15) & (dataset['age'] <= 25), 'age'] = 1
dataset.loc[(dataset['age'] > 25) & (dataset['age'] <= 35), 'age'] = 2
dataset.loc[(dataset['age'] > 35) & (dataset['Age'] <= 45), 'age'] = 3
dataset.loc[(dataset['age'] > 45) & (dataset['age'] <= 55), 'age'] = 4
dataset.loc[ dataset['age'] > 55, 'age']}
获取错误:
AttributeError: 'str' object has no attribute 'loc'
我正在寻找我的数据集,如下所示:
age(in existing dataset) age(in existing dataset)
25 1
35 2
45 3
73 4
答案 0 :(得分:2)
我认为需要省略循环,因为如果train
是DataFrame
,则dataset
是列名,显然是string
s:
np.random.seed(100)
train = pd.DataFrame(np.random.randint(10, size=(3,3)), columns=['age','col1','col2'])
print (train)
age col1 col2
0 8 8 3
1 7 7 0
2 4 2 5
for dataset in train:
print (dataset)
age
col1
col2
train.loc[(train['age'] > 15) & (train['age'] <= 25), 'new'] = 1
train.loc[(train['age'] > 25) & (train['age'] <= 35), 'new'] = 2
train.loc[(train['age'] > 35) & (train['age'] <= 45), 'new'] = 3
train.loc[(train['age'] > 45) & (train['age'] <= 55), 'new'] = 4
train.loc[ train['age'] > 55, 'new'] = 5
更好的是使用pd.cut
:
r = [0, 25, 35, 45, 55, 120]
g = [1,2,3,4,5]
train['new'] = pd.cut(train['age'], bins=r, labels=g)
答案 1 :(得分:1)
您的数据集似乎是一个字符串,而一个字符串没有attibute或method loc。 使用
检查数据集的类型type()
或
isinstance()
并看到它是正确的数据类型。
答案 2 :(得分:1)
r = [0,10,17,65, 110]
g = ['Child','Teen','Adult','Elderly']
train['AgeCtg'] = pd.cut(train['Age'], bins = r, labels = g)
train.head(50)
答案 3 :(得分:0)
只需这样做:
Train = [train]#converting the train dataframe into list
for dataset in Train:
dataset.loc[ dataset['Fare'] <= 17, 'Fare'] = 0,
dataset.loc[(dataset['Fare'] > 17) & (dataset['Fare'] <= 30), 'Fare'] = 1,
dataset.loc[(dataset['Fare'] > 30) & (dataset['Fare'] <= 100), 'Fare'] = 2,
dataset.loc[ dataset['Fare'] > 100, 'Fare'] = 3