我正在尝试根据推文和移动平均线的历史数量来预测用户将发布的推文数量。我是一个python开发人员,但ML中的总noob。以下是我从用户@POTUS获得的数据集:
Date | Number of tweets
01-03-2017 : 3
02-03-2017 : 2
03-03-2017 : 7
06-03-2017 : 2
07-03-2017 : 6
08-03-2017 : 6
09-03-2017 : 5
10-03-2017 : 5
11-03-2017 : 6
13-03-2017 : 11
14-03-2017 : 5
15-03-2017 : 10
16-03-2017 : 6
17-03-2017 : 7
18-03-2017 : 3
19-03-2017 : 2
20-03-2017 : 6
21-03-2017 : 9
22-03-2017 : 1
23-03-2017 : 3
24-03-2017 : 4
我还使用https://github.com/linsomniac/python-movingaverage/blob/master/movingaverage.py
计算了7天和3天的移动平均线Moving Average, 3 days :
[4.0, 3.67, 5.0, 4.67, 5.67, 5.33, 5.33, 7.33, 7.33, 8.67, 7.0, 7.67, 5.33, 4.0, 3.67, 5.67, 5.33, 4.33, 2.67, 4.0, 3.67, 4.33, 4.33, 6.0, 6.67, 5.67, 3.67, 2.33]
Moving Average, 7 days :
[4.43, 4.71, 5.29, 5.86, 6.29, 6.86, 6.86, 7.14, 6.86, 6.29, 5.57, 6.14, 4.86, 4.43, 4.0, 4.29, 4.29, 4.29, 3.71, 4.57, 5.29, 5.0, 4.43, 4.71]
我知道这是一个回归问题,但不太确定如何进一步解决这个问题。我应该用什么方法来预测用户将在未来几天发布多少推文?
答案 0 :(得分:0)
The data you posted is formatted as:
[date format] [number format]
To perform a numerical regression, the information formatted as a date must be formatted as a number. This is commonly done by converting the information into days since a particular date. If you format the data as:
[days since 01-03-2017] [number of tweets]
it would be more amenable to numerical analysis. I suggest that after reformatting the data, you make a scatter plot and see if there is a visible trend of some kind that might be mathematically modeled. If you do not see some sort of trend in data this simple, then machine learning will probably not find one either.