我有一个首先标准化的数据集,删除了na,现在,我尝试df [col] = preprocessing.scale(df [col] .values),在这里出现错误:ValueError:输入包含无穷大或对于dtype('float64')而言太大。
这是我完成的步骤:
1-通过删除nan来确保数据表(熊猫)没有NAN 2-使用pct_change标准化值 3-调用pct_change后立即放弃na
然后尝试缩放功能并获取错误
这是代码段:
从主通话中:
dataset = f"./Data/Original/{RATIO_TO_PREDICT}.csv"
df = pd.read_csv(dataset)
df.set_index("Timestamp", inplace = True)
#calculate volume candle type 1
#calculate volume candle type 2
#df['VC1_Future'] = df["VC1"].shift(-FUTURE_PERIOD_PREDICT)
#df['VC1_Target'] = list(map(classify,df["VC1"], df["VC1_Future"]))
#df['VC2_Future'] = df["VC2"].shift(-FUTURE_PERIOD_PREDICT)
#df['VC2_Target'] = list(map(classify,df["VC2"], df["VC2_Future"]))
df.fillna(method="ffill", inplace = True)
df.dropna(inplace=True)
df['Price_Future'] = df["Close"].shift(-FUTURE_PERIOD_PREDICT) # We go N number of time to the future, get that value and put it in this row's FUTURE PRICE value
df['Price_Target'] = list(map(classify,df["Close"], df["Price_Future"]))
# Now we compare the current price with that future price to see if we went up, down or none, here we use the 0.015 or 1.5% spread to make sure we pass commision
# Now we want to separate part of the data for training and another part for testing
times = sorted(df.index.values)
last_5pct = times[-int(0.1 * len(times))]
# We get the final columns we want, making sure we are not including any of the High, Low, and Open values. Remember that Price Target is last. That is OUR GOAL !!!
#dfs = df[["Close", "Volume", "Price_Future", "Price_Target"]]#, "VC1", "VC2", "VC1_Future", "VC2_Future", "VC1_Target", "VC2_Target", "Price_Future", "Price_Target"]]
# We finally separate the data into two different lists
validation_df = df[(df.index >= last_5pct)]
training_df = df[(df.index < last_5pct)]
# We save each list into a file so that we don't need to make this process walk through again unless A) we get new data B) we loose previous data on hard drive
Message(name)
print(len(df), len(training_df), len(validation_df))
Message(len(df))
#training_df.dropna(inplace=True)
print(np.isfinite(training_df).all())
print('')
#validation_df.dropna(inplace=True)
print(np.isfinite(validation_df).all())
Train_X, Train_Y = preprocess(training_df)
现在,关于功能,这是一个开始:
def preprocess(df) :
df.drop('Price_Future', 1)
#df.drop('VC1_Future', 1)
#df.drop('VC2_Future', 1)
for col in df.columns:
if col != "Price_Target" and col != "VC1_Target" and col != "VC2_Target":
df[col] = df[col].pct_change() # gets the percent change, other than the volume, the data now should sit between -1 and 1, the formula : (value[i] / value[i-1]) - 1
df.dropna(inplace=True)
df[col] = preprocessing.scale(df[col].values)
您可能会注意到,当我打电话给总管时,我正在检查nan的结果:
Open True
High True
Low True
Close True
Volume True
Price_Future False
Price_Target True
dtype: bool
在函数的开头,我删除了Price_Future列,所以,为什么在缩放行上会出现此错误?
此外,上面的代码会引起很多警告:
试图在DataFrame的切片副本上设置一个值。 尝试改用.loc [row_indexer,col_indexer] =值
但是我是python和所有这些东西的新手,所以我不知道如何在函数上修复代码。
请有人帮忙。
谢谢
答案 0 :(得分:0)
OUCH,找到了主要问题;
df [col] = preprocessing.scale(df [col] .values)
错了
df [col] = preprocessing.scale(df [col])
请注意,小数位数调用中缺少.values !!!
但是请有人帮助我处理这些警告消息。