计算不同类型的支出 - 熊猫/ Numpy - Python

时间:2016-11-10 07:58:25

标签: python pandas numpy analysis

我有2个数据帧:

df1
+------------+-------------+------+
| Product ID | Cost Method | Rate |
+------------+-------------+------+
|         10 | CPM         | 10   |
|         20 | CPC         | 0.3  |
|         30 | CPCV        | 0.4  |
|         40 | FLF         | 100  |
|         50 | VAD         | 0    |
|         60 | CPM         | 0.1  |
+------------+-------------+------+

df2
+--------+------------+-------------+--------+-----------------+
|  Date  | Product ID | Impressions | Clicks | Completed Views |
+--------+------------+-------------+--------+-----------------+
| 01-Jan |         10 |         300 |      4 |               0 |
| 02-Jan |         20 |          30 |      3 |               0 |
| 03-Jan |         30 |         200 |      4 |              20 |
| 02-Jan |         40 |         300 |      4 |               0 |
| 02-Jan |         40 |         500 |      4 |               0 |
| 03-Jan |         40 |         200 |      3 |               0 |
| 04-Jan |         90 |        3000 |      3 |               0 |
| 05-Jan |         50 |        3000 |      5 |               0 |
+--------+------------+-------------+--------+-----------------+

理想的输出是:

 +--------+------------+-------------+--------+-----------------+--------+
|  Date  | Product ID | Impressions | Clicks | Completed Views | Spend  |
+--------+------------+-------------+--------+-----------------+--------+
| 01-Jan |         10 |         300 |      4 |               0 |  $3    |
| 02-Jan |         20 |          30 |      3 |               0 |  $1    |
| 03-Jan |         30 |         200 |      4 |              20 |  $8    |
| 02-Jan |         40 |         300 |      4 |               0 |  $50   |
| 02-Jan |         40 |         500 |      4 |               0 |  $50   |
| 03-Jan |         40 |         200 |      3 |               0 |  $-    |
| 04-Jan |         90 |        3000 |      3 |               0 |  $-    |
| 05-Jan |         50 |        3000 |      5 |               0 |  $-    |
+--------+------------+-------------+--------+-----------------+--------+

其中:

  1. 产品与其ID匹配如果ID无法匹配,则 产品支出计算为0
  2. 其中FLF计算为 每天该产品的总展示次数总和,以及该总和的总和 超过某个最低限度,例如600次展示,然后是费率 被申请;被应用。如果同一天有两个或更多条目,那么 费率按其出现的次数平均分配 当天
  3. 如果产品是VAD,则花费为0
  4. 其中每次点击费用的计算方式是费率乘以点击次数
  5. 其中CPM计算为费率*(展示次数/ 1000)

1 个答案:

答案 0 :(得分:2)

即使我不是真的,我也会回答你。你是Stack Overflow(SO)的新手,所以让这个教育性的帖子。请放心,这篇文章的基调并不是试图屈服或苛刻。

首先,要问一个正确的问题(请this请),你需要做两件事:

  • 解释您尝试过的内容(提供代码示例!)并解释您的问题所在。您当前格式的问题肯定不符合要求。这里有5或6个完全不同的东西,感觉就像你只是要求别人做你的作业。
  • 提供一个可行的例子。

对于可行的示例,您有点这样做,但您选择的格式真的很烦人,因为无法直接使用pd.read_clipboard()来加载数据。这里的人们志愿服务他们的时间,如果他们不得不花5或10分钟重新创建数据,他们可能就是不会这样做。

以下我将如何做到这一点:

这是第一个数据框,使用df1 = pd.read_clipboard(index_col=0)加载它:

ProductID      CostMethod   Rate

10               CPM   10.0
20               CPC    0.3
30              CPCV    0.4
40               FLF  100.0
50               VAD    0.0
60               CPM    0.1

以下是第二个数据框,使用df2 = pd.read_clipboard(index_col=0)加载它:

ProductID  Date  Impressions  Clicks  CompletedViews
10         01-Jan          300       4               0
20         02-Jan           30       3               0
30         03-Jan          200       4              20
40         02-Jan          300       4               0
40         02-Jan          500       4               0
40         03-Jan          200       3               0
90         04-Jan         3000       3               0
50         05-Jan         3000       5               0

现在,就做功课而言,这是一个建议的解决方案。我相信你会尝试理解这段代码的作用,而不仅仅是重用它。

第1步:合并两个数据框

我在df2上合并,这非常重要。阅读Merging

上的pandas文档中的更多内容
df3 = df2.merge(df1, left_index=True, right_index=True, how='left')
df3

Merged df3

第2步:计算您的支出

我们要编写自定义函数,然后执行dataframe.apply

def calc_spend(row):
    """
    Accepts a row of the dataframe (df3.apply(calc_spend, axis=1)),
    and computes the spend according to these rules:
    * If costMethod is NaN, then zero
    * Where FLF is calculated as the sum of total impressions for that product per day, 
        and if that sums is over a certain minimum limit, 
        e.g. 600 impressions, then the rate is applied. 
        If there are two or more entries for the same day, 
        then the rate is divided equally by the count of times it appears in the same day
    * Where, if a product is VAD, then the spend is 0
    * Where CPC is calculated as the rate times the number of clicks
    * Where CPM is calculated as rate*(impression / 1000)
    """

    if row.CostMethod == 'FLF':
        # Calc the sum of total impressions for that product
        # I'm using boolean indexing to select the rows where both productID and Date
        # are the same as the current row
        filterdateproductid = (df3.Date == row.Date) & (df3.index == row.name)
        total_impressions = df3.ix[filterdateproductid, 'Impressions'].sum()
        if total_impressions < 600:
            spend = total_impressions
        else:
            count = df3.ix[filterdateproductid].shape[0]
            rate = row.Rate / count # If you use python 2.7 make sure you do "from future import division"
            spend = rate * total_impressions / 1000.0

    elif row.CostMethod == 'VAD':
        spend = 0

    elif row.CostMethod == 'CPC':
        spend = row.Rate * row.Clicks

    elif row.CostMethod == 'CPM':
        spend = row.Rate * row.Impressions / 1000.0

    else: # Includes the case where the costMethod is Na
        spend = 0

    return spend

现在我们可以自己应用函数:

df3['Spend'] = df3.apply(calc_spend, axis=1)
df3

Final result

你或许会注意到&#34;花费&#34;我的计算结果与你的完全相同,但这是因为你对如何计算它的初始规格并不是那么好。您可以轻松更改calc_spend功能以符合您的要求。