R - 匹配重新编码建议

时间:2015-10-07 16:32:56

标签: r match recode

我正在努力做一些显而易见的事情。

所以我有一个代码列表和他们的重新编码。

> head(codesTv)

  X5000 TV.Diary.Event
1  5001           Play
2  5002   Drama Series
3  5003    Other Drama
4  5004           Film
5  5005      Pop Music
6  5006         Comedy

然后我有一个需要重新编码为ttest的向量。

> head(as.data.frame(ttest))
                ttest
1        SPITTING IMA
2                5999
3        KRAMERVSKRAM
4                NEWS
5           BROOKSIDE
6             NOTHING

我需要的是简单地从codesTv重新编码需要重新编码的值。

但我发现这样做的唯一方法就是这个繁琐的代码:

ttest [ ttest %in% codesTv$X5000 ] = codesTv$TV.Diary.Event [ match(ttest [ttest %in% codesTv$X5000], codesTv$X5000) ] 

有人会有更简单的想法吗?

数据

ttest = c("SPITTING IMA", "5999", "KRAMERVSKRAM", "NEWS", "BROOKSIDE", 
"NOTHING", "NOTHING", "BROOKSIDE", "5004", "5004", "5999", "YANKS", 
"5999", "5999", "5999", "5999", "\"V\"", "GET FRESH", "5999", 
"5999", "HEIDI", "FAME", "SAT  SHOW", "5021", "BLUE PETER", "V", 
"EASTENDERS", "WORLD  CUP", "GRANDSTAND", "SPORT", "WORLD CUP", 
"BLUE PETER", "WORLD CUP", "HORIZON", "REGGIEPERRIN", "5999", 
"BROOKSIDE", "HNKYTNK MAN", "5999", "5999")

 codesTv = structure(list(X5000 = c("5001", "5002", "5003", "5004", "5005", 
"5006", "5007", "5008", "5009", "5010", "5011", "5012", "5013", 
"5014", "5015", "5016", "5017", "5019", "5020", "5021", "5022", 
"5023", "5888", "5999"), TV.Diary.Event = c("Play", "Drama Series", 
"Other Drama", "Film", "Pop Music", "Comedy", "Chat Show", "Quiz/Panel Game", 
"Cartoon", "Special L/E Event", "Classical Music", "Contemporary Music", 
"Arts", "News", "Politics", "Consumer Affairs", "Spec Current Affairs", 
"Documentary", "Religious Affairs", "Sport", "Childrens TV", 
"Party Political", "Continuation Event", "Non-event (Missing)"
)), .Names = c("X5000", "TV.Diary.Event"), row.names = c(NA, 
-24L), class = "data.frame")

1 个答案:

答案 0 :(得分:2)

OP的解决方案应该可以正常工作。这是另一种方式:

library(data.table)

# confirm that there is overlap
intersect(ttest, codesTv$X5000) # "5999" "5004" "5021"  

# replace values in ttest
setDT(list(X5000=ttest))[codesTv, X5000 := i.TV.Diary.Event, on="X5000"]

# confirm that the values were overwritten
intersect(ttest, codesTv$X5000) # character(0)

Stole this idea from @eddi。这应该是内存有效的,因为我们通过引用修改ttest而不是复制。