我有csv格式的数据。
数据格式如下。将Receipt nos放在一列中,将Product放在相应的列中
Receipt_no Product
A1 Apple
A1 Banana
A1 Orange
A2 Pineapple
A2 Jackfruit
A3 Cola
A3 Tea
我想将它们重新排列为
A1 , Apple, Banana, Orange
A2 , Pineapple, Jackfruit
A3 , Cola, Tea
这是以逗号分隔的一行中的收据编号和产品名称。由于数据很大,我想在R中重新排列相同的内容。
请帮助
感谢。
此致 Nithish
答案 0 :(得分:0)
基地R,
Here: ▼▼▼▼
^(.*?)\s*(?:\(((?:19|20)\d\d)\)|[:.])[\s:]*(.*?[?.!])\s*([\w\s]+?)\.?\s*(?:((?:19|20)\d\d)(?:\s+\w+)?)?[.;\s]*(\d+)\s*(?:\(\d+\))?[,:\s]+(\d+(?:-\d+)?)[^\d]*$
使用aggregate(Product ~ Receipt_no, df, paste, collapse = ',')
,
dplyr
答案 1 :(得分:0)
使用基数R:
u <- as.vector(unique(df$Receipt_no))
as.list(sapply(u, function(x) paste0(x, ", ", paste0(subset(df$Product, df$Receipt_no==x), collapse = ", "))))
# $A1
# [1] "A1, Apple, Banana, Orange"
# $A2
# [1] "A2, Pineapple, Jackfruit"
# $A3
# [1] "A3, Cola, Tea"
数据强>
df <- structure(list(Receipt_no = structure(c(1L, 1L, 1L, 2L, 2L, 3L,
3L), .Label = c("A1", "A2", "A3"), class = "factor"), Product = structure(c(1L,
2L, 5L, 6L, 4L, 3L, 7L), .Label = c("Apple", "Banana", "Cola",
"Jackfruit", "Orange", "Pineapple", "Tea"), class = "factor")), .Names = c("Receipt_no",
"Product"), class = "data.frame", row.names = c(NA, -7L))