
时间:2017-05-01 04:41:44

标签: machine-learning neural-network classification keras keras-layer

所以我有一些推文,其中有几个列,比如Date和Tweet本身以及更多,但我想使用2列来构建我的模型(Sentiment& Stock Price)每条推文都会进行情感分析,以及他们旁边的股票价格在我的数据库中如此:

| sentiment          | stock_price |
| 0.0454545454545455 |      299.82 |
| 0.0588235294117647 |      299.83 |
| 0.0434782608695652 |      299.83 |
|            -0.0625 |      299.69 |
| 0.0454545454545455 |       299.7 |


但是当我训练模型时,我将其作为输出。 enter image description here

我的输入数据是什么使得准确性和验证准确性不会改变?这似乎是过度拟合的迹象,我试图添加Dropout Layers但它没有用。我怎样才能解决这个问题?我哪里出错?

我已经通过像我自己的热编码一样使用1/0 / -1来指示股票的数据是静止还是下降。

Name: pct_chg, dtype: float64
0       0.0
1       1.0
2      -1.0
3      -1.0
4      -1.0


0       0.0
1       1.0
2       0.0
3      -1.0
4       1.0
5       0.0
6      -1.0




ValueError: Error when checking model target: expected dense_3 to have shape (None, 1) but got array with shape (10000, 2)

即使我把input_dim = 2放在一个二维数组中我得到相同的错误,除非我把input_dim = 3然后它完全跳过2然后转到3并且我得到这个错误

ValueError: Error when checking model target: expected dense_3 to have shape (None, 3) but got array with shape (10000, 2)


Train on 4000 samples, validate on 6000 samples
Epoch 1/5
  32/4000 [..............................] - ETA: 0s - loss: 0.6930 - acc: 0.3125
 384/4000 [=>............................] - ETA: 0s - loss: 0.6570 - acc: 0.2370
 736/4000 [====>.........................] - ETA: 0s - loss: 0.6362 - acc: 0.2337
1120/4000 [=======>......................] - ETA: 0s - loss: 0.6151 - acc: 0.2321
1472/4000 [==========>...................] - ETA: 0s - loss: 0.5992 - acc: 0.2371
1824/4000 [============>.................] - ETA: 0s - loss: 0.5874 - acc: 0.2401
2176/4000 [===============>..............] - ETA: 0s - loss: 0.5765 - acc: 0.2459
2560/4000 [==================>...........] - ETA: 0s - loss: 0.5652 - acc: 0.2457
2912/4000 [====================>.........] - ETA: 0s - loss: 0.5568 - acc: 0.2448
3232/4000 [=======================>......] - ETA: 0s - loss: 0.5519 - acc: 0.2475
3584/4000 [=========================>....] - ETA: 0s - loss: 0.5440 - acc: 0.2517
3936/4000 [============================>.] - ETA: 0s - loss: 0.5391 - acc: 0.2492
4000/4000 [==============================] - 1s - loss: 0.5379 - acc: 0.2487 - val_loss: 0.5083 - val_acc: 0.2032
Epoch 2/5
  32/4000 [..............................] - ETA: 0s - loss: 0.4986 - acc: 0.3438
 384/4000 [=>............................] - ETA: 0s - loss: 0.4640 - acc: 0.2370
 736/4000 [====>.........................] - ETA: 0s - loss: 0.4619 - acc: 0.2473
1088/4000 [=======>......................] - ETA: 0s - loss: 0.4637 - acc: 0.2537
1472/4000 [==========>...................] - ETA: 0s - loss: 0.4666 - acc: 0.2575
1824/4000 [============>.................] - ETA: 0s - loss: 0.4657 - acc: 0.2467
2208/4000 [===============>..............] - ETA: 0s - loss: 0.4600 - acc: 0.2509
2560/4000 [==================>...........] - ETA: 0s - loss: 0.4585 - acc: 0.2523
2912/4000 [====================>.........] - ETA: 0s - loss: 0.4558 - acc: 0.2514
3264/4000 [=======================>......] - ETA: 0s - loss: 0.4548 - acc: 0.2509
3584/4000 [=========================>....] - ETA: 0s - loss: 0.4547 - acc: 0.2492
3936/4000 [============================>.] - ETA: 0s - loss: 0.4552 - acc: 0.2490
4000/4000 [==============================] - 1s - loss: 0.4558 - acc: 0.2480 - val_loss: 0.4797 - val_acc: 0.2032
Epoch 3/5
  32/4000 [..............................] - ETA: 0s - loss: 0.3874 - acc: 0.2812
 352/4000 [=>............................] - ETA: 0s - loss: 0.4465 - acc: 0.2585
 704/4000 [====>.........................] - ETA: 0s - loss: 0.4394 - acc: 0.2372
1056/4000 [======>.......................] - ETA: 0s - loss: 0.4375 - acc: 0.2557
1408/4000 [=========>....................] - ETA: 0s - loss: 0.4384 - acc: 0.2507
1728/4000 [===========>..................] - ETA: 0s - loss: 0.4373 - acc: 0.2546
2048/4000 [==============>...............] - ETA: 0s - loss: 0.4363 - acc: 0.2549
2400/4000 [=================>............] - ETA: 0s - loss: 0.4334 - acc: 0.2525
2752/4000 [===================>..........] - ETA: 0s - loss: 0.4326 - acc: 0.2529
3104/4000 [======================>.......] - ETA: 0s - loss: 0.4324 - acc: 0.2519
3424/4000 [========================>.....] - ETA: 0s - loss: 0.4304 - acc: 0.2480
3776/4000 [===========================>..] - ETA: 0s - loss: 0.4311 - acc: 0.2489
4000/4000 [==============================] - 1s - loss: 0.4300 - acc: 0.2480 - val_loss: 0.4663 - val_acc: 0.2032
Epoch 4/5
  32/4000 [..............................] - ETA: 0s - loss: 0.3656 - acc: 0.3438
 384/4000 [=>............................] - ETA: 0s - loss: 0.4214 - acc: 0.2474
 736/4000 [====>.........................] - ETA: 0s - loss: 0.4133 - acc: 0.2514
1088/4000 [=======>......................] - ETA: 0s - loss: 0.4154 - acc: 0.2417
1440/4000 [=========>....................] - ETA: 0s - loss: 0.4140 - acc: 0.2431
1792/4000 [============>.................] - ETA: 0s - loss: 0.4183 - acc: 0.2461
2144/4000 [===============>..............] - ETA: 0s - loss: 0.4162 - acc: 0.2481
2496/4000 [=================>............] - ETA: 0s - loss: 0.4149 - acc: 0.2468
2848/4000 [====================>.........] - ETA: 0s - loss: 0.4138 - acc: 0.2521
3168/4000 [======================>.......] - ETA: 0s - loss: 0.4171 - acc: 0.2487
3488/4000 [=========================>....] - ETA: 0s - loss: 0.4172 - acc: 0.2480
3840/4000 [===========================>..] - ETA: 0s - loss: 0.4131 - acc: 0.2479
4000/4000 [==============================] - 1s - loss: 0.4158 - acc: 0.2480 - val_loss: 0.4580 - val_acc: 0.2032
Epoch 5/5
  32/4000 [..............................] - ETA: 0s - loss: 0.3798 - acc: 0.3438
 384/4000 [=>............................] - ETA: 0s - loss: 0.3999 - acc: 0.2682
 736/4000 [====>.........................] - ETA: 0s - loss: 0.4005 - acc: 0.2663
1088/4000 [=======>......................] - ETA: 0s - loss: 0.3960 - acc: 0.2610
1440/4000 [=========>....................] - ETA: 0s - loss: 0.3988 - acc: 0.2465
1760/4000 [============>.................] - ETA: 0s - loss: 0.3962 - acc: 0.2500
2080/4000 [==============>...............] - ETA: 0s - loss: 0.3997 - acc: 0.2428
2400/4000 [=================>............] - ETA: 0s - loss: 0.4018 - acc: 0.2492
2752/4000 [===================>..........] - ETA: 0s - loss: 0.4062 - acc: 0.2522
3104/4000 [======================>.......] - ETA: 0s - loss: 0.4054 - acc: 0.2494
3424/4000 [========================>.....] - ETA: 0s - loss: 0.4059 - acc: 0.2468
3744/4000 [===========================>..] - ETA: 0s - loss: 0.4051 - acc: 0.2479
4000/4000 [==============================] - 1s - loss: 0.4060 - acc: 0.2480 - val_loss: 0.4523 - val_acc: 0.2032


from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import SGD
import pymysql as mysql
import numpy as np
from keras.utils import np_utils
import pandas as pd
import matplotlib.pyplot as plt
import config

##This is finding the % change between the stock prices. a negative number mean it has drops and positive number mean it has rissen
def stockToVec(y_vali):
    x = y_vali.copy()
    x['pct_chg'] = x['stock_price'].pct_change()
    x['pct_chg'][0] = 0
    ##I then make my own One Hot Encoding in the loop below.
    for index, row in x.iterrows():
        if row['pct_chg'] > 0:
            row['pct_chg'] = 1
        if row['pct_chg'] < 0:
            row['pct_chg'] = -1
        if row['pct_chg'] == 0:
            row['pct_chg'] = 0
    del (x['stock_price'])
    return x

def sentToVec(y_vali):
    y = y_vali.copy()
    y['sen_chg'] = y['sentiment'].pct_change()
    y['sen_chg'][0] = 0
    ##I then make my own One Hot Encoding in the loop below.
    for index, row in y.iterrows():
        if row['sen_chg'] > 0:
            row['sen_chg'] = 1
        if row['sen_chg'] < 0:
            row['sen_chg'] = -1
        if row['sen_chg'] == 0:
            row['sen_chg'] = 0
    return y

    sql = "SELECT stock_price, sentiment from tweets WHERE stock_price != 301.44 AND sentiment != 0 LIMIT 0, 10000"
    con = mysql.connect(config.dbhost, config.dbuser, config.dbpassword, config.dbname, charset='utf8mb4', autocommit=True)
    results = pd.read_sql(sql=sql, con=con)

sent = sentToVec(results)
stock = stockToVec(results)

#This is the ANN Model
model = Sequential()
model.add(Dense(40, input_dim=1, activation='softmax'))
model.add(Dense(2000, activation='relu'))
##2 Layers to predict if the stock is going up or down
model.add(Dense(2, activation='softmax'))

sgd = SGD(lr=0.01, momentum=0.3, decay=0.05, nesterov=True)

model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

history = model.fit(stock['pct_chg'].as_matrix(), sent['sen_chg'].as_matrix(), shuffle=True, validation_split=0.6, epochs=5)

plt.plot(history.history['loss'], color='b', label="Loss")
plt.plot(history.history['acc'], color='g', label="Accuracy")
plt.plot(history.history['val_loss'], color='k', label="Validation Loss")
plt.plot(history.history['val_acc'], color='m', label="Validation Accuracy")


1 个答案:

答案 0 :(得分:0)


model.add(Dense(40, input_dim=40, activation='softmax'))


sent_vectors = np.vstack([sent[i:i+40].values 
                          for i in range(len(sent)-40)])


history = model.fit(sent_vectors, 
