Question

我需要帮助，确定为交易金融工具（股票，外币，crpyto等）的机器学习模型创建目标变量的最佳方法。

下面是一些示例数据，可以帮助您回答问题。

Time: When the data was recorded  
Price: Is the current price of the instrument and is the price I would be buying at.  
price_good: Price level that I need to sell at to make a profit  
price_bad: Price level that I need to sell at to minimize losses  
good_id: Returns the ID of the first row where price >= price_good  
bad_id: Returns the ID of the first row where price <= price_bad   
target: If good_id < bad_id, target = 1. Else target = 0

我的理想情况是price_good出现在price_bad之前。我遇到的问题是如何正确设置目标字段。我有2种方法可以解决此问题-

选项＃1-该选项使用所有数据，对创建目标字段没有限制，但在生产中的工作方式略有不匹配。

id         time       price    price_good   price_bad    good_id   bad_id   target
1          01-01-19   100      110          90           4         nan      1
2          01-02-19   105      115          95           4         nan      1
3          01-03-19   109      120          99           4         nan      1 
4          01-04-19   121      131          111          nan       5        0
5          01-05-19   110      120          97           nan       nan      nan

选项＃2-此选项不会使用所有数据，并且不允许设置目标，直到初始行达到目标为止，这就是它在生产中的工作方式。

id         time       price    price_good   price_bad    good_id   bad_id   target
1          01-01-19   100      110          90           4         nan      1
2          01-02-19   105      115          95           nan       nan      nan
3          01-03-19   109      120          99           nan       nan      nan 
4          01-04-19   121      131          111          nan       5        0
5          01-05-19   110      120          97           nan       nan      nan

“生产中”是指模型构建后的工作方式。例如，如果我的模型预测id：1会达到target_good，那么我将无法进行任何其他交易，直到该交易达到与选项＃2一致的price_good或price_bad为止。但是，如果我有更多的资本可以投资，我将失去从期权1可以进行的所有其他交易的信息。

如果我选择选项＃1，该模型有时会“过度拟合”，因为它的数据在时间上更接近并且使用的信息大部分相同。

如果我选择选项2，它可以更精确地表示其在生产中的工作方式，但会丢失大量数据点。例如，如果我从id：2而不是id：1开始怎么办。我最终将获得建立模型的不同数据点。我可以在多个位置进行测试以找到最佳的开始位置，但这需要大量的迭代和资源来进行测试。

关于如何正确建模的任何提示？谢谢！

如何为基于时间序列的模型正确设置目标？

0 个答案: