使用以下代码段
import pandas as pd
train = pd.read_csv('train.csv',parse_dates=['dates'])
print(data['dates'])
我加载并控制数据。
我的问题是,如何标准化/标准化数据['日期']使所有元素介于-1和1之间(线性或高斯)?
答案 0 :(得分:3)
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import time
def convert_to_timestamp(x):
"""Convert date objects to integers"""
return time.mktime(x.to_datetime().timetuple())
def normalize(df):
"""Normalize the DF using min/max"""
scaler = MinMaxScaler(feature_range=(-1, 1))
dates_scaled = scaler.fit_transform(df['dates'])
return dates_scaled
if __name__ == '__main__':
# Create a random series of dates
df = pd.DataFrame({
'dates':
['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
'1981-01-21', '1991-02-21', '1991-03-23']
})
# Convert to date objects
df['dates'] = pd.to_datetime(df['dates'])
# Now df has date objects like you would, we convert to UNIX timestamps
df['dates'] = df['dates'].apply(convert_to_timestamp)
# Call normalization function
df = normalize(df)
convert_to_timestamp
dates
0 1980-01-01
1 1980-02-02
2 1980-03-02
3 1980-01-21
4 1981-01-21
5 1991-02-21
6 1991-03-23
MinMaxScaler
sklearn
进行规范化的UNIX时间戳
dates
0 315507600
1 318272400
2 320778000
3 317235600
4 348858000
5 667069200
6 669661200
[-1. -0.98438644 -0.97023664 -0.99024152 -0.81166138 0.98536228
1. ]
答案 1 :(得分:1)
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
df = pd.DataFrame(np.random.randint(1, 100, (1000, 2)).astype(float64), columns=['A', 'B'])
A B
0 87 95
1 15 12
2 85 88
3 33 61
4 33 29
5 33 91
6 67 19
7 68 20
8 79 18
9 29 93
.. .. ..
990 70 84
991 37 24
992 91 12
993 92 13
994 4 64
995 32 98
996 97 62
997 38 40
998 12 56
999 48 8
[1000 rows x 2 columns]
# specify your desired range (-1, 1)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled = scaler.fit_transform(df.values)
print(scaled)
[[ 0.7551 0.9184]
[-0.7143 -0.7755]
[ 0.7143 0.7755]
...,
[-0.2449 -0.2041]
[-0.7755 0.1224]
[-0.0408 -0.8571]]
df[['A', 'B']] = scaled
Out[30]:
A B
0 0.7551 0.9184
1 -0.7143 -0.7755
2 0.7143 0.7755
3 -0.3469 0.2245
4 -0.3469 -0.4286
5 -0.3469 0.8367
6 0.3469 -0.6327
7 0.3673 -0.6122
8 0.5918 -0.6531
9 -0.4286 0.8776
.. ... ...
990 0.4082 0.6939
991 -0.2653 -0.5306
992 0.8367 -0.7755
993 0.8571 -0.7551
994 -0.9388 0.2857
995 -0.3673 0.9796
996 0.9592 0.2449
997 -0.2449 -0.2041
998 -0.7755 0.1224
999 -0.0408 -0.8571
[1000 rows x 2 columns]
答案 2 :(得分:1)
Pandas的解决方案
TableA