具有重复行名称的长到宽

时间:2017-06-23 10:40:18

标签: r reshape reshape2

我认为之前论坛上没有出现过这个特殊问题,但如果这是一个重复的问题,请指出正确的方向!

我有以下数据集,并希望将其从长到宽重塑。

ID   variable                   value
1   number of students          1000
1   percentage on financial aid  28
1   acceptance rate              12
1   percentage on financial aid  35
2   number of students          2000
2   percentage on financial aid  1
2   percentage on financial aid  70

请注意,每个ID都会显示两次值percentage on financial aid。我想在从长到宽重塑时只保留第二次出现,因为第一次出现的是“经济援助”措施的学校等级,而第二次出现的是实际价值。

两个值的变量名percentage on financial aid完全相同,所以我想知道是否有办法告诉R用第二个覆盖第一个匹配项。现在R似乎是第一次出现。

2 个答案:

答案 0 :(得分:1)

zz = '
ID   variable                   value
1   number_of_students          1000
1   percentage_on_financial_aid  28
1   acceptance_rate              12
1   percentage_on_financial_aid  35
2   number_of_students          2000
2   percentage_on_financial_aid  1
2   percentage_on_financial_aid  70
'

df <- read.table(text = zz, header = TRUE)


ndf = apply(df, 2, rev)
ndf = as.data.frame(ndf)
nd = reshape(ndf, idvar = "ID", timevar = "variable", direction = "wide")
a = colnames(nd)
b = sub('.*\\.', '', a)
colnames(nd) = b
nd

  ID percentage_on_financial_aid number_of_students acceptance_rate
1  2                          70               2000            <NA>
4  1                          35               1000              12

如果你fromLast = T那么:

nd = reshape(df, idvar = "ID", timevar = "variable", direction = "wide")
a = colnames(nd)
b = sub('.*\\.', '', a)
colnames(nd) = b
nd

答案 1 :(得分:0)

感谢评论的人们,我想到了这一点。

解决方案:

df <- subset(df,duplicated(df[,1:2])|!duplicated(df[,1:2],fromLast=TRUE))