我正在处理两个表:
t1<-data.frame(Name=c("Waldo","Mark","Harold","Earl"),Number=c(1,4,3,9))
和
t2<-data.frame(Whatever=c("does","not","really","matter","at","all"),Waldo=c(0,1,1,0,0,1),Mark=c(1,0,1,1,0,0),Harold=c(0,1,0,0,0,0),Earl=c(1,1,1,1,0,0),Extra=c("another","column","appearing","in","this","table"))
我想用t2
中的查找值替换t1
中的1。 t2
的列名称在t1
中显示为记录。 t2
中的所有0值应保持不变。
在我的真实数据中,t2
中有数百列,t1
中有数百行。
t2
中还有几列不受此编码影响,但应保留在最终输出中。
是否有编码的最佳实践?
该示例的所需输出如下:
Whatever Waldo Mark Harold Earl Extra
does 0 4 0 9 another
not 1 0 3 9 column
really 1 4 0 9 appearing
matter 0 4 0 9 in
at 0 0 0 0 this
all 1 0 0 0 table
提前谢谢!
答案 0 :(得分:1)
这对于您的实际数据集应该足够灵活:
my_function <- function(df, lookup) {
for(i in names(df)) {
df[[as.character(i)]][df[[as.character(i)]] == 1] <- lookup$Number[lookup$Name == as.character(i)]
}
return(df)
}
my_function(t2, t1)
# Whatever Waldo Mark Harold Earl Extra
# 1 does 0 4 0 9 another
# 2 not 1 0 3 9 column
# 3 really 1 4 0 9 appearing
# 4 matter 0 4 0 9 in
# 5 at 0 0 0 0 this
# 6 all 1 0 0 0 table
答案 1 :(得分:1)
这是一个tidyverse
工作流程,在这个示例中可能有点多余,但是对于较大的数据集应该可以很好地扩展。我将其分为几步,以免从宽数据到再到长数据再复杂不过了:
首先,我将t2
重塑为长格式,并过滤以1:进行观察。
library(tidyverse)
t2 %>%
gather(key = Name, value = value, -Whatever, -Extra) %>%
filter(value == 1)
#> Whatever Extra Name value
#> 1 not column Waldo 1
#> 2 really appearing Waldo 1
#> 3 all table Waldo 1
#> 4 does another Mark 1
#> 5 really appearing Mark 1
#> 6 matter in Mark 1
#> 7 not column Harold 1
#> 8 does another Earl 1
#> 9 not column Earl 1
#> 10 really appearing Earl 1
#> 11 matter in Earl 1
然后,我与t1
一起使用left_join
,以防t2
中的任何观测值与t1
中的值不匹配。这使我从Number
中获得了t1
列,因此现在我可以从收集中删除value
列:
t2 %>%
gather(key = Name, value = value, -Whatever, -Extra) %>%
filter(value == 1) %>%
left_join(t1, by = "Name") %>%
select(-value)
#> Whatever Extra Name Number
#> 1 not column Waldo 1
#> 2 really appearing Waldo 1
#> 3 all table Waldo 1
#> 4 does another Mark 4
#> 5 really appearing Mark 4
#> 6 matter in Mark 4
#> 7 not column Harold 3
#> 8 does another Earl 9
#> 9 not column Earl 9
#> 10 really appearing Earl 9
#> 11 matter in Earl 9
然后,我使用spread
将其恢复为宽格式。请注意,这些函数会创建要对其进行排序的因子,因此,最后的扩展列将按字母顺序排列。如果需要,可以使用select
更改列的顺序。
从头到尾的过程:
t2 %>%
gather(key = Name, value = value, -Whatever, -Extra) %>%
filter(value == 1) %>%
left_join(t1, by = "Name") %>%
select(-value) %>%
spread(key = Name, value = Number, fill = 0)
#> Whatever Extra Earl Harold Mark Waldo
#> 1 all table 0 0 0 1
#> 2 does another 9 0 4 0
#> 3 matter in 9 0 4 0
#> 4 not column 9 3 0 1
#> 5 really appearing 9 0 4 1
由reprex package(v0.2.0)于2018-08-14创建。