我认为之前论坛上没有出现过这个特殊问题,但如果这是一个重复的问题,请指出正确的方向!
我有以下数据集,并希望将其从长到宽重塑。
ID variable value
1 number of students 1000
1 percentage on financial aid 28
1 acceptance rate 12
1 percentage on financial aid 35
2 number of students 2000
2 percentage on financial aid 1
2 percentage on financial aid 70
请注意,每个ID都会显示两次值percentage on financial aid
。我想在从长到宽重塑时只保留第二次出现,因为第一次出现的是“经济援助”措施的学校等级,而第二次出现的是实际价值。
两个值的变量名percentage on financial aid
完全相同,所以我想知道是否有办法告诉R用第二个覆盖第一个匹配项。现在R似乎是第一次出现。
答案 0 :(得分:1)
zz = '
ID variable value
1 number_of_students 1000
1 percentage_on_financial_aid 28
1 acceptance_rate 12
1 percentage_on_financial_aid 35
2 number_of_students 2000
2 percentage_on_financial_aid 1
2 percentage_on_financial_aid 70
'
df <- read.table(text = zz, header = TRUE)
ndf = apply(df, 2, rev)
ndf = as.data.frame(ndf)
nd = reshape(ndf, idvar = "ID", timevar = "variable", direction = "wide")
a = colnames(nd)
b = sub('.*\\.', '', a)
colnames(nd) = b
nd
ID percentage_on_financial_aid number_of_students acceptance_rate
1 2 70 2000 <NA>
4 1 35 1000 12
如果你fromLast = T
那么:
nd = reshape(df, idvar = "ID", timevar = "variable", direction = "wide")
a = colnames(nd)
b = sub('.*\\.', '', a)
colnames(nd) = b
nd
答案 1 :(得分:0)
感谢评论的人们,我想到了这一点。
解决方案:
df <- subset(df,duplicated(df[,1:2])|!duplicated(df[,1:2],fromLast=TRUE))