我有三个表作为数据帧读入R。
表1:
Student ID School_Name Campus Area
4356791 BCCS Northwest Springdale
03127. BZS South Vernon
12437. BCCS. South Vernon
表2:
ProctorID. Date. Score. Student ID Form#
0211 10/05/16 75.57 55612 25432178
0211 10/17/16 83.04 55612 47135671
5134 10/17/16 63.28 02613 2371245
表3:
ProctorID First. Last. Campus Area
O211. Simone Lewis. Northwest Springdale
5134. Mona. Yashamito Northwest Springdale
0712. Steven. Lewis. South Vernon
我想结合数据框并创建一个表格,其中每个区域的分数均按学校名称相邻。我想要类似以下的输出:
School_Name Form# Northwest Springdale Southvernon
BCCS. 2543127. 83.04. 63.25
BCCS. 35674. 75.14. *
BZS. 5321567. 65.2. 62.3
针对特定学校的特定表格可能没有针对特定区域的分数。有任何想法吗?我一直在玩sqldf包。还可以在不使用任何sql的情况下在R中进行操作吗?
答案 0 :(得分:0)
要投射,是这样的:
library(reshape2)
casted_df <- dcast(df, ... ~ "Campus Area", value.var="Score.")
一个似乎对我有用的例子:
df1 <- data.frame("StudentID" = 1:3, "SchoolName" = c("School1", "School2", "School3"), "Area" = c("Area1", "Area2", "Area3"))
df2 <- data.frame("StudentID" = 1:3, "Score" = 100:102, "Proctor" = 4:6)
df3 <- data.frame("Proctor" = 4:6, "Area" = c("Area1", "Area2", "Area3"), "Name" = c("John", "Jane", "Jim"))
combined <- merge(df1, df2, by.x = "StudentID")
combined2 <- merge(combined, df3, by.x = "Proctor", by.y="Proctor")
library(reshape2)
final <- dcast(combined2, ... ~ Area.x, value.var="Score")