Question

我有一个数据集，其中包含有关不同产品的买入和卖出价格的信息。但是，不是将买入和卖出的价格存储在同一行中，而是存储在两个单独的行中，这些行由买卖变量标识，如下所示。

Product|Product Type|Price|Bought|Sold
---------------------------------------
Apples |   Green    |  1  |   0  |  1
---------------------------------------
Apples |   Green    |  2  |   1  |  0
---------------------------------------
Apples |   Red      |  3  |   0  |  1
---------------------------------------
Apples |   Red      |  4  |   1  |  0
---------------------------------------

我想将买入和卖出的价格加入一行，所以它看起来像这样：

Product|Product Type|Bought Price|Sold Price
---------------------------------------------
Apples |   Green    |      1     |    2
---------------------------------------------
Apples |   Red      |      4     |    3

以下是创建我的示例数据集的代码。提前感谢您的帮助。

Product <- c("Apples", "Apples", "Apples", "Apples", "Apples", "Apples",
             "Oranges", "Oranges", "Oranges", "Oranges", "Oranges", "Oranges",
             "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits")
ProductType <- c("Green", "Green", "Red", "Red", "Pink", "Pink",
                 "Big", "Big", "Medium", "Medium", "Small", "Small",
                 "Chocolate", "Chocolate", "Oat", "Oat", "Digestive", "Digestive")


Price <- c(2, 1, 3, 4, 1, 2,
           5, 3, 2, 1, 2, 3,
           6, 4, 1, 8, 6, 2)

Bought <- c(0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1)

Sold <- c(1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0)

sales <- data.frame(Product, ProductType, Price, Bought, Sold)

Answer 1

使用dplyr：

library(dplyr)

sales %>% 
  group_by(Product, ProductType) %>% 
  summarise(BoughtPrice = Price[ Bought == 1 ],
            SoldPrice = Price[ Sold == 1 ]) %>% 
  ungroup()

Answer 2

library(dplyr)
df <- data.frame(Product, ProductType, Price, Bought, Sold)
df %>% group_by(Product, ProductType) %>% 
  summarise(Bought_Price = sum(Price * Bought), 
            Sold_Price = sum(Sold * Price))

# A tibble: 9 x 4
# Groups:   Product [?]
# Product ProductType Bought_Price Sold_Price
# <fctr>      <fctr>        <dbl>      <dbl>
#   1   Apples       Green            1          2
# 2   Apples        Pink            2          1
# 3   Apples         Red            4          3
# 4 Buscuits   Chocolate            4          6
# 5 Buscuits   Digestive            2          6
# 6 Buscuits         Oat            8          1
# 7  Oranges         Big            3          5
# 8  Oranges      Medium            1          2
# 9  Oranges       Small            3          2

Answer 3

使用dplyr，我们按产品＆＃39;产品类型＆＃39;和summarise进行分组，以创建＆＃39; BoughtPrice＆＃39;和＆＃39; SoldPrice＆＃39;通过子集化＆＃39; Price＆＃39;在哪里＆＃39;购买＆＃39;或者＆＃39;已售出＆＃39;是1

library(dplyr)
sales %>% 
     group_by(Product, ProductType) %>% 
     summarise(BoughtPrice = Price[Bought==1], SoldPrice = Price[Sold ==1])

data.table的类似方法是

library(data.table)
setDT(sales)[, lapply(.SD, function(x) Price[x==1]),
                   .(Product, ProductType), .SDcols = Bought:Sold]

基于两个标识符将两行合并为一行

3 个答案: