rowcount基于r中的另一列1

时间:2018-05-13 10:59:42

标签: r dataframe dplyr

好的,我有以下数据框,有数千行,数据框的输出如下。此数据框记录电子商务网站上的订单,它列出了为每个订单ID购买的产品

     | order_id| product_id|product_name                     |
     |--------:|----------:|:--------------------------------|
     |  1187899|        196|Soda                             |
     |  1187899|      25133|Organic String Cheese            |
     |  1187899|      38928|0% Greek Strained Yogurt         |
     |  1187899|      26405|XL Pick-A-Size Paper Towel Rolls |
     |  1187899|      39657|Milk Chocolate Almonds           |
     |  1187899|      10258|Pistachios                       |
     |  1187899|      13032|Cinnamon Toast Crunch            |
     |  1187899|      26088|Aged White Cheddar Popcorn       |
     |  1187899|      27845|Organic Whole Milk               |
     |  1187899|      49235|Organic Half & Half              |
     |  1187899|      46149|Zero Calorie Cola                |
     |  1492625|      22963|Organic Roasted Turkey Breast    |
     |  1492625|       7963|Gluten Free Whole Grain Bread    |
     |  1492625|      16589|Plantain Chips                   |
     |  1492625|      32792|Chipotle Beef & Pork Realstick   |

用于列出上述数据框的代码是:

 temp <- orders  %>%
  inner_join(opt,by="order_id") %>%
  inner_join(products,by="product_id") %>%
  select(order_id,product_id,product_name)
  kable(head(temp,15))

我想计算最有序的产品,基本上,我的输出应该是这样的:

     product_id | Order_Count
        196         10025
        7963        9025
        25133       8903

我无法弄清楚如何解决这个问题,我已经尝试过:

      mutate(prods = count(product_id))

但它没有用,我收到了一个错误说:Error in mutate_impl(.data, dots) : Evaluation error: no applicable method for 'groups' applied to an object of class "factor".

任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:0)

您可以使用table()打印一张简单的表格(如Rui Barradas所述),或者如果您想要一个带有计数的数据框,请使用dplyr::count()

library(tidyverse)

orders <- tibble::tribble(
  ~order_id, ~product_id, ~product_name,
  "1187899", "196", "Soda",
  "1187899", "25133", "Organic String Cheese",
  "1187899", "38928", "0% Greek Strained Yogurt",
  "1187899", "26405", "XL Pick-A-Size Paper Towel Rolls",
  "1187899", "39657", "Milk Chocolate Almonds",
  "1187899", "10258", "Pistachios",
  "1187899", "10258", "Pistachios",
  "1187899", "10258", "Pistachios",
  "1187899", "13032", "Cinnamon Toast Crunch",
  "1187899", "13032", "Cinnamon Toast Crunch",
  "1187899", "26088", "Aged White Cheddar Popcorn",
  "1187899", "27845", "Organic Whole Milk",
  "1187899", "49235", "Organic Half & Half",
  "1187899", "46149", "Zero Calorie Cola",
  "1492625", "22963", "Organic Roasted Turkey Breast",
  "1492625", "7963", "Gluten Free Whole Grain Bread",
  "1492625", "16589", "Plantain Chips",
  "1492625", "32792", "Chipotle Beef & Pork Realstick"
)

一个简单的打印表,其中包含(例如)每个product_id计数

table(orders$product_id)

但是如果你想要一个带有计数的数据框,要绘制或用于任何事情,那么

orders %>%
  count(product_id, product_name)

> + # A tibble: 15 x 3
>    product_id product_name                         n
>    <chr>      <chr>                            <int>
>  1 10258      Pistachios                           3
>  2 13032      Cinnamon Toast Crunch                2
>  3 16589      Plantain Chips                       1
>  4 196        Soda                                 1
>  5 22963      Organic Roasted Turkey Breast        1
>  6 25133      Organic String Cheese                1
>  7 26088      Aged White Cheddar Popcorn           1
>  8 26405      XL Pick-A-Size Paper Towel Rolls     1
>  9 27845      Organic Whole Milk                   1
> 10 32792      Chipotle Beef & Pork Realstick       1
> 11 38928      0% Greek Strained Yogurt             1
> 12 39657      Milk Chocolate Almonds               1
> 13 46149      Zero Calorie Cola                    1
> 14 49235      Organic Half & Half                  1
> 15 7963       Gluten Free Whole Grain Bread        1