基于两个标识符将两行合并为一行

时间:2017-10-25 08:46:36

标签: r join merge row

我有一个数据集,其中包含有关不同产品的买入和卖出价格的信息。但是,不是将买入和卖出的价格存储在同一行中,而是存储在两个单独的行中,这些行由买卖变量标识,如下所示。

Product|Product Type|Price|Bought|Sold
---------------------------------------
Apples |   Green    |  1  |   0  |  1
---------------------------------------
Apples |   Green    |  2  |   1  |  0
---------------------------------------
Apples |   Red      |  3  |   0  |  1
---------------------------------------
Apples |   Red      |  4  |   1  |  0
---------------------------------------

我想将买入和卖出的价格加入一行,所以它看起来像这样:

Product|Product Type|Bought Price|Sold Price
---------------------------------------------
Apples |   Green    |      1     |    2
---------------------------------------------
Apples |   Red      |      4     |    3

以下是创建我的示例数据集的代码。提前感谢您的帮助。

Product <- c("Apples", "Apples", "Apples", "Apples", "Apples", "Apples",
             "Oranges", "Oranges", "Oranges", "Oranges", "Oranges", "Oranges",
             "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits")
ProductType <- c("Green", "Green", "Red", "Red", "Pink", "Pink",
                 "Big", "Big", "Medium", "Medium", "Small", "Small",
                 "Chocolate", "Chocolate", "Oat", "Oat", "Digestive", "Digestive")


Price <- c(2, 1, 3, 4, 1, 2,
           5, 3, 2, 1, 2, 3,
           6, 4, 1, 8, 6, 2)

Bought <- c(0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1)

Sold <- c(1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0)

sales <- data.frame(Product, ProductType, Price, Bought, Sold)

3 个答案:

答案 0 :(得分:4)

使用dplyr:

library(dplyr)

sales %>% 
  group_by(Product, ProductType) %>% 
  summarise(BoughtPrice = Price[ Bought == 1 ],
            SoldPrice = Price[ Sold == 1 ]) %>% 
  ungroup()

答案 1 :(得分:3)

library(dplyr)
df <- data.frame(Product, ProductType, Price, Bought, Sold)
df %>% group_by(Product, ProductType) %>% 
  summarise(Bought_Price = sum(Price * Bought), 
            Sold_Price = sum(Sold * Price))

# A tibble: 9 x 4
# Groups:   Product [?]
# Product ProductType Bought_Price Sold_Price
# <fctr>      <fctr>        <dbl>      <dbl>
#   1   Apples       Green            1          2
# 2   Apples        Pink            2          1
# 3   Apples         Red            4          3
# 4 Buscuits   Chocolate            4          6
# 5 Buscuits   Digestive            2          6
# 6 Buscuits         Oat            8          1
# 7  Oranges         Big            3          5
# 8  Oranges      Medium            1          2
# 9  Oranges       Small            3          2

答案 2 :(得分:2)

使用dplyr,我们按产品&#39;产品类型&#39;和summarise进行分组,以创建&#39; BoughtPrice&#39;和&#39; SoldPrice&#39;通过子集化&#39; Price&#39;在哪里&#39;购买&#39;或者&#39;已售出&#39;是1

library(dplyr)
sales %>% 
     group_by(Product, ProductType) %>% 
     summarise(BoughtPrice = Price[Bought==1], SoldPrice = Price[Sold ==1])

data.table的类似方法是

library(data.table)
setDT(sales)[, lapply(.SD, function(x) Price[x==1]),
                   .(Product, ProductType), .SDcols = Bought:Sold]