我编写了以下代码来规范化数据框的几列:
import pandas as pd
train = pd.read_csv('test1.csv')
header = train.columns.values
print(train)
print(header)
inputs = header[0:3]
trainArr = train.as_matrix(inputs)
print(inputs)
trainArr[inputs] = trainArr[inputs].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
代码中的一些输入是:
v1 v2 v3 result
0 12 31 31 0
1 34 52 4 1
2 32 4 5 1
3 7 89 2 0
['v1' 'v2' 'v3' 'result']
['v1' 'v2' 'v3']
但是,我收到以下错误:
trainArr[inputs] = trainArr[inputs].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
IndexError: arrays used as indices must be of integer (or boolean) type
有人知道我错过了什么吗?谢谢!
答案 0 :(得分:1)
我认为您可以先按[:3]
选择前三列,然后按DataFrame
创建train[header]
的子集。最后,您可以apply
执行前3列:
print (train)
v1 v2 v3 result
0 12 31 31 0
1 34 52 4 1
2 32 4 5 1
3 7 89 2 0
header = train.columns[:3]
print(header)
Index([u'v1', u'v2', u'v3'], dtype='object')
print (train[header])
v1 v2 v3
0 12 31 31
1 34 52 4
2 32 4 5
3 7 89 2
train[header] = train[header].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
print (train)
v1 v2 v3 result
0 -0.342593 -0.152941 0.706897 0
1 0.472222 0.094118 -0.224138 1
2 0.398148 -0.470588 -0.189655 1
3 -0.527778 0.529412 -0.293103 0
但我认为更好的是使用iloc
来选择前3列:
print (train.iloc[:,:3])
v1 v2 v3
0 12 31 31
1 34 52 4
2 32 4 5
3 7 89 2
train.iloc[:,:3] = train.iloc[:,:3].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
print train
v1 v2 v3 result
0 -0.342593 -0.152941 0.706897 0
1 0.472222 0.094118 -0.224138 1
2 0.398148 -0.470588 -0.189655 1
3 -0.527778 0.529412 -0.293103 0