我正在尝试在R中创建一个查找表,以便以与我工作的公司相同的格式获取我的数据。
它考虑了我想要使用dplyr合并的不同教育类别。
library(dplyr)
# Create data
education <- c("Mechanichal Engineering","Electric Engineering","Political Science","Economics")
data <- data.frame(X1=replicate(1,sample(education,1000,rep=TRUE)))
tbl_df(data)
# Create lookup table
lut <- c("Mechanichal Engineering" = "Engineering",
"Electric Engineering" = "Engineering",
"Political Science" = "Social Science",
"Economics" = "Social Science")
# Assign lookup table
data$X1 <- lut[data$X1]
但是在我的输出中,我的旧值被替换为错误的值,即不是我在查找表中创建的值。相反,似乎查找表是随机分配的。
答案 0 :(得分:2)
education <- c("Mechanichal Engineering","Electric Engineering","Political Science","Economics")
lut <- list("Mechanichal Engineering" = "Engineering",
"Electric Engineering" = "Engineering",
"Political Science" = "Social Science",
"Economics" = "Social Science")
lut2<-melt(lut)
data1 <- data.frame(X1=replicate(1,sample(education,1000,rep=TRUE)))
data1$new <- lut2[match(data1$X1,lut2$L1),'value']
head(data1)
======================= ==============
X1 new
======================= ==============
Political Science Social Science
Political Science Social Science
Mechanichal Engineering Engineering
Mechanichal Engineering Engineering
Political Science Social Science
Political Science Social Science
======================= ==============
答案 1 :(得分:2)
我一直试图自己解决这个问题。我对我发现的大多数解决方案都不太满意,所以这就是我最终的结果。我添加了一个“其他”类别,以表明即使查找表中没有定义值,它也能正常工作。
|Sales page| -callback-> |Login page| -tokenU-> |UserBase Cloud Function| -token?-> |Sales Cloud Function|
| | <--tokenS-- | | <-tokenS- | | <-tokenS- | |
答案 2 :(得分:0)
我发现最好的方法是使用recode()
包中的car
# Observe that dplyr also has a recode function, so require car after dplyr
require(dplyr)
require(car)
数据是从中抽样的四种教育类别。
education <- c("Mechanichal Engineering",
"Electric Engineering","Political Science","Economics")
data <- data.frame(ID = c(1:1000), X1 = replicate(1,sample(education,1000,rep=TRUE)))
对数据使用recode()
我重新编码类别
lut <- data.frame(ID = c(1:1000), X2 = recode(data$X1, '"Economics" = "Social Science";
"Electric Engineering" = "Engineering";
"Political Science" = "Social Science";
"Mechanichal Engineering" = "Engineering"'))
要查看其是否正确执行,请加入原始数据和重新编码的数据
data <- full_join(data, lut, by = "ID")
head(data)
ID X1 X2
1 1 Political Science Social Science
2 2 Economics Social Science
3 3 Electric Engineering Engineering
4 4 Political Science Social Science
5 5 Economics Social Science
6 6 Mechanichal Engineering Engineering
使用recode,您无需在重新编码之前对数据进行排序。