Tensorflow-准备CSV文件以填充循环神经网络

时间:2018-11-18 22:08:48

标签: python tensorflow neural-network

我有一个问题。在学校,我们通过神经网络开始了一个新项目,我们必须选择我们要编程的人工智能类型。我选择了一个递归神经网络,它可以预测一段时间后价格会涨还是跌。我成功地对此进行了编程,并且训练得很好。但是现在我想尝试测试运行,但是我不知道如何准备一个csv文件来馈送RNN。这是我的训练代码:

main_df = pd.DataFrame()

ratios = ["BTC-USD", "LTC-USD", "ETH-USD"]
for ratio in ratios:


    url="https://www.test.nl/get_csv_content.php?method=train&ratio=" + str(ratio)
    dataset = requests.get(url, verify=False).content
    df = pd.read_csv(io.StringIO(dataset.decode('utf-8')), names=["time", "low", "high", "open", "close", "volume", "rsi14", "ma5", "ema5", "ema12", "ema20", "macd", "signal"])

    df.rename(columns={"close": str(ratio)+"_close", "volume": str(ratio) + "_volume", "rsi14": str(ratio) + "_rsi14", "ma5": str(ratio) + "_ma5", "ema5": str(ratio) + "_ema5", "ema12": str(ratio) + "_ema12", "ema20": str(ratio) + "_ema20", "macd": str(ratio) + "_macd", "signal": str(ratio) + "_signal"}, inplace=True)

    df.set_index("time", inplace=True)
    df = df[[str(ratio) + "_close", str(ratio) + "_volume", str(ratio) + "_rsi14", str(ratio) + "_ma5", str(ratio) + "_ema5", str(ratio) + "_ema12", str(ratio) + "_ema20", str(ratio) + "_macd", str(ratio) + "_signal"]]

    if len(main_df) == 0:
        main_df = df
    else:
        main_df = main_df.join(df)


main_df['future'] = main_df[str(RATIO_TO_PREDICT) + "_close"].shift(-FUTURE_PERIOD_PREDICT)
main_df['target'] = list(map(classify, main_df[str(RATIO_TO_PREDICT) + "_close"], main_df["future"]))
#print(main_df[[str(RATIO_TO_PREDICT) + "_close", "future", "target"]].head(10))


times = sorted(main_df.index.values)
last_5pct = times[-int(0.05*len(times))]

validation_main_df = main_df[(main_df.index >= last_5pct)]
main_df = main_df[(main_df.index < last_5pct)]

train_x, train_y = preprocess_df(main_df)
validation_x, validation_y = preprocess_df(validation_main_df)

这是函数:

#Constant Variables
SEQ_LEN = 30
FUTURE_PERIOD_PREDICT = 3
RATIO_TO_PREDICT = "LTC-USD"
EPOCHS = 10
BATCH_SIZE = 64
NAME = str(RATIO_TO_PREDICT) + "-" + str(SEQ_LEN) + "-SEQ-" + str(FUTURE_PERIOD_PREDICT) + "-PRED-" + str(int(time.time()))

def classify(current, future):
    if float(future) > float(current):
        return 1
    else:
        return 0

def preprocess_df(df):
    df = df.drop('future', 1)

    for col in df.columns:
        if col != "target":
            df[col] = df[col].pct_change()
            df.dropna(inplace=True)
            df[col] = preprocessing.scale(df[col].values)

    df.dropna(inplace=True)

    sequential_data = []
    prev_days = deque(maxlen=SEQ_LEN)



    for i in df.values:
        prev_days.append([n for n in i[:-1]])
        if len(prev_days) == SEQ_LEN:
            sequential_data.append([np.array(prev_days), i[-1]])

    random.shuffle(sequential_data)

    buys = []
    sells = []

    for seq, target in sequential_data:
        if target == 0:
            sells.append([seq, target])
        elif target == 1:
            buys.append([seq, target])


    random.shuffle(buys)
    random.shuffle(sells)

    lower = min(len(buys), len(sells))


    buys = buys[:lower]
    sells = sells[:lower]


    sequential_data = buys+sells

    random.shuffle(sequential_data)

    x = []
    y = []

    for seq, target in sequential_data:
        x.append(seq)
        y.append(target)

    return np.array(x), y

现在的问题是:训练模型后,如何在模型中准备新的CSV文件?

1 个答案:

答案 0 :(得分:0)

通常,要测试模型,您将选择原始数据集的一个子集,并将其留作测试用途,仅用于 。也就是说,您根本不会将这些数据用于训练。

现在,您在代码中使用的从远程服务器获取CSV文件的链接对我不起作用,但是它确实具有查询参数?method=train,您可以将其更改为某些内容例如?method=test来获取测试数据集,并将其用于试运行。失败的话,您可以只预留20%的数据集进行测试,而将其余的用于训练。