R数据转换

时间:2016-10-29 21:28:36

标签: r transformation transactional

我有一个包含三列的数据框,用于捕获交易数据,包括CustomerName,OrderDate和已购买产品的名称。我必须将数据帧转换为另一个数据帧,其格式使得客户在单个日期购买的所有项目都在一行中。

当我处理大型数据集时,是否有一种有效的方法来进行此转换,希望不使用for循环。

此外,数据框中产品的列数必须等于任何客户在任何一天购买的产品的最大数量。请在转换前后找到数据框的示例

原始数据:

data <- data.frame(Customer  = c("John", "John", "John", "Tom", "Tom", "Tom", "Sally", "Sally", "Sally", "Sally"),
                   OrderDate = c("1-Oct", "2-Oct", "2-Oct", "2-Oct","2-Oct", "2-Oct", "3-Oct", "3-Oct", "3-Oct", "3-Oct"),
                   Product   = c("Milk", "Eggs", "Bread", "Chicken", "Pizza", "Beer", "Salad", "Apples", "Eggs", "Wine"),
                   stringsAsFactors = FALSE)

#    Customer OrderDate Product
# 1      John     1-Oct    Milk
# 2      John     2-Oct    Eggs
# 3      John     2-Oct   Bread
# 4       Tom     2-Oct Chicken
# 5       Tom     2-Oct   Pizza
# 6       Tom     2-Oct    Beer
# 7     Sally     3-Oct   Salad
# 8     Sally     3-Oct  Apples
# 9     Sally     3-Oct    Eggs
# 10    Sally     3-Oct    Wine

后穿越 - :

datatransform <- as.data.frame(matrix(NA, nrow = 4, ncol = 6))
colnames(datatransform) <- c("Customer", "OrderDate", "Product1", "Product2", "Product3", "Product4")
datatransform$Customer <- c("John", "John", "Tom", "Sally")
datatransform$OrderDate <- c("1-Oct", "2-Oct", "2-Oct", "3-Oct")
datatransform[1, 3:6] <- c("Milk", "", "", "") 
datatransform[2, 3:6 ] <- c("Eggs", "Bread", "", "")
datatransform[3, 3:6 ] <- c("Chicken", "Pizza", "Beer", "")
datatransform[4, 3:6 ] <- c("Salad", "Apples", "Eggs", "Wine")

#   Customer OrderDate Product1 Product2 Product3 Product4
# 1     John     1-Oct     Milk                           
# 2     John     2-Oct     Eggs    Bread                  
# 3      Tom     2-Oct  Chicken    Pizza     Beer         
# 4    Sally     3-Oct    Salad   Apples     Eggs     Wine

此外,数据框中产品的列数必须等于任何客户在任何一天购买的产品的最大数量。

1 个答案:

答案 0 :(得分:0)

既然你谈到了大数据集(那么效率是一个非常重要的问题需要考虑),这里有一个dplyr和reshape2解决方案:

library(reshape2)
library(dplyr)

data  %>% group_by(Customer, OrderDate) %>%
          mutate(ProductValue = paste0("Product", 1:n()) ) %>%
          dcast(Customer + OrderDate ~ ProductValue, value.var = "Product"  ) %>%
          arrange(OrderDate)

  Customer OrderDate Product1 Product2 Product3 Product4
1     John     1-Oct     Milk     <NA>     <NA>     <NA>
2     John     2-Oct     Eggs    Bread     <NA>     <NA>
3      Tom     2-Oct  Chicken    Pizza     Beer     <NA>
4    Sally     3-Oct    Salad   Apples     Eggs     Wine