Question

想要将数据从长格式转换为宽格式。整体而言，ColA只需要一排。在我尝试按计数聚合的情况下，ColB中会出现重复的ColE。 ColF由sum（）聚合。

s <- read_csv("sample.csv")
s_1 <- subset(s, select=c("ColA", "ColF"))
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum)
head(grp_by)

不确定如何转换其余列

更新：根据使用reshape2包的建议

library(reshape2)

s <- read_csv("sample.csv")
s_1 <- subset(s, select=c("ColA", "ColF"))
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum)

s2 <- dcast(s, ColA ~ ColB)
s3 <- dcast(s, ColA ~ ColC)
s4 <- dcast(s, ColA ~ ColD)
s5 <- dcast(s, ColA ~ ColE)

print(s2)
print(s3)
print(s4)
print(s5)
print(grp_by)

这是那些打印语句的输出。

如何将所有这些合并到一个数据框中？我的实际数据集是100万条记录 - 这段代码是否足以在其上运行，或者是否有更好的编写方式。谢谢你的帮助。

Answer 1

这是我用来转换和合并数据的示例代码。可能会有一些更好的方法，但这是我能想到的最好方法。

# Include needed libraries
library(reshape2)

# Load the sample data
s <- read_csv("sample.csv")

# Aggregate ColF by SUM for each ColA
s_1 <- subset(s, select=c("ColA", "ColF"))
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum)

# Long to Wide format
s2 <- dcast(s, ColA ~ ColB)
s3 <- dcast(s, ColA ~ ColC)
s4 <- dcast(s, ColA ~ ColD)
s5 <- dcast(s, ColA ~ ColE)

# But this is the crude way of removing NA columns which I used!
# Rename the NA column into something so that it can be removed by assigning NULL!!
colnames(s2)[7] <- "RemoveMe"
colnames(s3)[5] <- "RemoveMe"
colnames(s4)[5] <- "RemoveMe"
colnames(s5)[4] <- "RemoveMe"

s2$RemoveMe <- NULL
s3$RemoveMe <- NULL
s4$RemoveMe <- NULL
s5$RemoveMe <- NULL

# Merge all pieces to form the final transformed data
s2 <- merge(x = s2, y = s3, by="ColA", all = TRUE)
s2 <- merge(x = s2, y = s4, by="ColA", all = TRUE)
s2 <- merge(x = s2, y = s5, by="ColA", all = TRUE)
s2 <- merge(x = s2, y = grp_by, by="ColA", all = TRUE)

# Removing the row with user_id = NA!!
s2 <- s2[-c(4), ]

# Final transformed data
print(s2)

将这些用作参考：

dcast - How to reshape data from long to wide format?
合并 - How to join (merge) data frames (inner, outer, left, right)?

从长格式转换为宽格式

1 个答案: