Question

我目前正在从事在线广告优化项目。假设我唯一可以更改的就是CPC（每次点击费用）。我没有太多数据，因为数据每天仅更新一次。我想通过CPC预测net_income，并希望该程序根据每天更新的数据来建议最佳CPC值，以使明天的net_income最大化。

    cpc   margin
0   440 -95224.0
1   840 -81620.0
2   530 -57496.0
3   590 -47287.0
4   560 -45681.0
5   590 -52766.0
6   500 -60852.0
7   650 -59653.0
8   480 -48905.0
9   620 -56496.0
10  680 -53614.0
11  590 -44440.0
12  460 -34066.0
13  720 -31086.0
14  590 -23177.0
15  680 -12803.0
16  760 -10625.0
17  590 -20548.0
18  800 -15136.0
19  650 -12804.0
20  420 -63435.0
21  400  -7566.0
22  400  21136.0
23  400 -58585.0
24  400 -14166.0
25  420 -23065.0
26  400 -28533.0
27  380 -14454.0
28  400 -50819.0
29  380 -26356.0
30  400 -26322.0
31  380 -19107.0
32  400 -28270.0
33  380 -88439.0
34  360 -32207.0
35  340 -27632.0
36  340 -18050.0
37  340 -71574.0
38  340 -18050.0
39  320 -20735.0
40  300 -17984.0
41  290  -9426.0
42  280 -16555.0
43  290   2961.0

例如，假设上述数据为df。

我尝试使用sklearn和LogisticRegression来获得预测：

import pandas as pd
from sklearn import datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression

model = LinearRegression()
model.fit(df['cpc'], df['margin'])
prediction = model.predict([[300]])
print(prediction[0])

保证金是净收入，顺便说一句。

因此，我认为我可能会根据CPC为300时的数据来获得预测，但是它返回了一条错误消息：

ValueError: Expected 2D array, got 1D array instead:
array=[440 840 530 590 560 590 500 650 480 620 680 590 460 720 590 680 760 590
 800 650 420 400 400 400 400 420 400 380 400 380 400 380 400 380 360 340
 340 340 340 320 300 290 280 290].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

我一直在寻找一些使用线性回归模型或物流回归模型的示例，但是它们都使用2维数组作为输入，这不符合我的需求。我只有一个可以更改的因素，结果就是net_income（或保证金）。

如何在项目中使用sklearn？还是有另一种更好的方法来解决问题？

我刚开始编程，对数学和统计知识一无所知，这使我很难理解或无法学习关键词。请指导我。

---------------------------------更新------------- ------------------------ 好吧，我再给你一个df

    cpc    margin
0   440  -35224.0
1   340  -11574.0
2   380  -68439.0
3   420  -23435.0
4   840  -81620.0
5   400  -38585.0
6   530  -37496.0
7   590   -7287.0
8   560   -5681.0
9   590  -32766.0
10  500  -60852.0
11  400  -30819.0
12  650  -59653.0
13  480  -28905.0
14  620  -56496.0
15  680  -53614.0
16  590  -44440.0
17  460  -14066.0
18  420   16935.0
19  360  -12207.0
20  400   -8533.0
21  400   -6322.0
22  400   25834.0
23  720  -31086.0
24  400  121136.0
25  400  -28270.0
26  340    1950.0
27  340    1950.0
28  300    2016.0
29  340  -27632.0
30  400   32434.0
31  380  -26356.0
32  590  -23177.0
33  680    7197.0
34  320  -20735.0
35  760    9375.0
36  590  -20548.0
37  290   10574.0
38  380  -19107.0
39  290   42961.0
40  280  -16555.0
41  800  -15136.0
42  380  -14454.0
43  650  -12804.0

感谢您的回答，我可以进一步介绍如下。在我可以无错误地运行代码之后，我认为通过循环输入，我将能够获得最佳的cpc值。

import pandas as pd
from sklearn import datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
df = pd.DataFrame(final_db)
model = LogisticRegression()
x = df[['cpc']]
model.fit(x, df['margin'])
previous_prediction = -99999999999999
df_prediction = []
for i in list(range(10, 1000, 10)):
    prediction = model.predict([[i]])
    df_prediction.append({'cpc':i, 'margin' : prediction})
    if prediction > previous_prediction:
        previous_prediction = prediction
        previous_i = i

，结果如下

不太令人满意。根据我的数据，有没有更好的模型可以使用？为了实现我的目标，还有其他建议吗？

Answer 1

我想这是在抱怨这条线
<button onclick="somme(b,a)">SOMME</button> <label id='prompt'>Bonjour</label> def somme(self,b,a): self.ExecuteJavascript(document.getElementById('id').innerHTML = "somme ="+b+a)

其中第一个参数应该是二维数组。您可以使用DataFrame的数组索引
model.fit(df['cpc'], df['margin'])
获取DataFrame而不是系列，即可解决此问题

更新数据进行预测

1 个答案: