Question

我认为之前论坛上没有出现过这个特殊问题，但如果这是一个重复的问题，请指出正确的方向！

我有以下数据集，并希望将其从长到宽重塑。

ID   variable                   value

1   number of students          1000
1   percentage on financial aid  28
1   acceptance rate              12
1   percentage on financial aid  35
2   number of students          2000
2   percentage on financial aid  1
2   percentage on financial aid  70

请注意，每个ID都会显示两次值percentage on financial aid。我想在从长到宽重塑时只保留第二次出现，因为第一次出现的是“经济援助”措施的学校等级，而第二次出现的是实际价值。

两个值的变量名percentage on financial aid完全相同，所以我想知道是否有办法告诉R用第二个覆盖第一个匹配项。现在R似乎是第一次出现。

Answer 1

zz = '
ID   variable                   value
1   number_of_students          1000
1   percentage_on_financial_aid  28
1   acceptance_rate              12
1   percentage_on_financial_aid  35
2   number_of_students          2000
2   percentage_on_financial_aid  1
2   percentage_on_financial_aid  70
'

df <- read.table(text = zz, header = TRUE)


ndf = apply(df, 2, rev)
ndf = as.data.frame(ndf)
nd = reshape(ndf, idvar = "ID", timevar = "variable", direction = "wide")
a = colnames(nd)
b = sub('.*\\.', '', a)
colnames(nd) = b
nd

  ID percentage_on_financial_aid number_of_students acceptance_rate
1  2                          70               2000            <NA>
4  1                          35               1000              12

如果你fromLast = T那么：

nd = reshape(df, idvar = "ID", timevar = "variable", direction = "wide")
a = colnames(nd)
b = sub('.*\\.', '', a)
colnames(nd) = b
nd

Answer 2

感谢评论的人们，我想到了这一点。

解决方案：

df <- subset(df,duplicated(df[,1:2])|!duplicated(df[,1:2],fromLast=TRUE))

具有重复行名称的长到宽

2 个答案: