我正在努力做一些显而易见的事情。
所以我有一个代码列表和他们的重新编码。
> head(codesTv)
X5000 TV.Diary.Event
1 5001 Play
2 5002 Drama Series
3 5003 Other Drama
4 5004 Film
5 5005 Pop Music
6 5006 Comedy
然后我有一个需要重新编码为ttest
的向量。
> head(as.data.frame(ttest))
ttest
1 SPITTING IMA
2 5999
3 KRAMERVSKRAM
4 NEWS
5 BROOKSIDE
6 NOTHING
我需要的是简单地从codesTv
重新编码需要重新编码的值。
但我发现这样做的唯一方法就是这个繁琐的代码:
ttest [ ttest %in% codesTv$X5000 ] = codesTv$TV.Diary.Event [ match(ttest [ttest %in% codesTv$X5000], codesTv$X5000) ]
有人会有更简单的想法吗?
数据
ttest = c("SPITTING IMA", "5999", "KRAMERVSKRAM", "NEWS", "BROOKSIDE",
"NOTHING", "NOTHING", "BROOKSIDE", "5004", "5004", "5999", "YANKS",
"5999", "5999", "5999", "5999", "\"V\"", "GET FRESH", "5999",
"5999", "HEIDI", "FAME", "SAT SHOW", "5021", "BLUE PETER", "V",
"EASTENDERS", "WORLD CUP", "GRANDSTAND", "SPORT", "WORLD CUP",
"BLUE PETER", "WORLD CUP", "HORIZON", "REGGIEPERRIN", "5999",
"BROOKSIDE", "HNKYTNK MAN", "5999", "5999")
codesTv = structure(list(X5000 = c("5001", "5002", "5003", "5004", "5005",
"5006", "5007", "5008", "5009", "5010", "5011", "5012", "5013",
"5014", "5015", "5016", "5017", "5019", "5020", "5021", "5022",
"5023", "5888", "5999"), TV.Diary.Event = c("Play", "Drama Series",
"Other Drama", "Film", "Pop Music", "Comedy", "Chat Show", "Quiz/Panel Game",
"Cartoon", "Special L/E Event", "Classical Music", "Contemporary Music",
"Arts", "News", "Politics", "Consumer Affairs", "Spec Current Affairs",
"Documentary", "Religious Affairs", "Sport", "Childrens TV",
"Party Political", "Continuation Event", "Non-event (Missing)"
)), .Names = c("X5000", "TV.Diary.Event"), row.names = c(NA,
-24L), class = "data.frame")
答案 0 :(得分:2)
OP的解决方案应该可以正常工作。这是另一种方式:
library(data.table)
# confirm that there is overlap
intersect(ttest, codesTv$X5000) # "5999" "5004" "5021"
# replace values in ttest
setDT(list(X5000=ttest))[codesTv, X5000 := i.TV.Diary.Event, on="X5000"]
# confirm that the values were overwritten
intersect(ttest, codesTv$X5000) # character(0)
Stole this idea from @eddi。这应该是内存有效的,因为我们通过引用修改ttest
而不是复制。