如何基于一个单独数据帧中的两列重新编码数据帧中的值?

时间:2019-11-21 14:18:03

标签: r tidyverse

我的问题是这个。我有一个包含四列的数据框,名为CSES。我有一个名为meta的单独数据框,其中包含两列codename。如果任何CSES列中的值都与meta$code相匹配,我想将其替换为meta$name中的值。

目前,我的解决方案使用循环,效果很好;但是有没有更好的方法来避免循环,最好使用Tidyverse?似乎是一个非常标准的问题,应该已经有了答案,但我没有找到任何答案。感谢您的帮助。干杯。

for(i in seq_along(meta$code)) {
  code <- meta$code[[i]]
  for(k in seq_along(CSES)) {
    col <- CSES[[k]]
    col[col == code] <- meta$name[[i]]
    CSES[[k]] <- col
  }
}

编辑:我在下面添加了输入数据和所需的输出,并阐明了每个请求的所需输出。

为澄清起见,期望如果meta$code列与CSES中的值匹配,则CSES中的值应更改为同一行上的meta$name值作为匹配的meta$code

CSES,这是我要重新编码的数据:

# A tibble: 274,719 x 6
#   IMD5000_A IMD5000_B IMD5000_C IMD5000_D IMD5000_E IMD5000_F
#   <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    
# 1 320001    320002    320003    320004    320005    320006   
# 2 320001    320002    320003    320004    320005    320006   
# 3 320001    320002    320003    320004    320005    320006   
# 4 320001    320002    320003    320004    320005    320006   
# 5 320001    320002    320003    320004    320005    320006   
# 6 320001    320002    320003    320004    320005    320006   
# 7 320001    320002    320003    320004    320005    320006   
# 8 320001    320002    320003    320004    320005    320006   
# 9 320001    320002    320003    320004    320005    320006   
# 10 320001    320002    320003    320004    320005    320006   
# … with 274,709 more rows

元:

# A tibble: 44 x 2
#   code    name 
#   <chr>   <chr>
# 1 7520001 SAP  
# 2 7520002 M    
# 3 7520003 FP   
# 4 7520004 KD   
# 5 7520005 MP   
# 6 7520006 C    
# 7 7520007 V    
# 8 7520008 SD   
# 9 7520009 Fi   
#10 7520010 Jl   
# … with 34 more rows

例如,CSES中的值7520001应该更改为“ SAP”。

2 个答案:

答案 0 :(得分:0)

示例数据:

set.seed(10)
CSES <- as.data.frame(matrix(sample(letters[1:10], 40, T), 10),
                      stringsAsFactors = F)
meta <- data.frame(code = letters[1:3], name = c('peach', 'pear', 'apple'),
                   stringsAsFactors = F)

CSES
#    V1 V2 V3 V4
# 1   f  g  i  f
# 2   d  f  g  a
# 3   e  b  h  b
# 4   g  f  d  i
# 5   a  d  e  e
# 6   c  e  h  h
# 7   c  a  i  i
# 8   c  c  c  j
# 9   g  d  h  g
# 10  e  i  d  f
meta
#   code  name
# 1    a peach
# 2    b  pear
# 3    c apple

现在使用与给定代码相对应的元名称来更新CSES的每一列

基数R

repl <- matrix(meta$name[match(as.matrix(CSES), meta$code)], nrow(CSES))

CSES[!is.na(repl)] <- repl[!is.na(repl)]

CSES
#       V1    V2    V3    V4
# 1      f     g     i     f
# 2      d     f     g peach
# 3      e  pear     h  pear
# 4      g     f     d     i
# 5  peach     d     e     e
# 6  apple     e     h     h
# 7  apple peach     i     i
# 8  apple apple apple     j
# 9      g     d     h     g
# 10     e     i     d     f

dplyr

library(dplyr)

CSES %>% 
  mutate_all(~ coalesce(meta$name[match(.x, meta$code)], .x))
#       V1    V2    V3    V4
# 1      f     g     i     f
# 2      d     f     g peach
# 3      e  pear     h  pear
# 4      g     f     d     i
# 5  peach     d     e     e
# 6  apple     e     h     h
# 7  apple peach     i     i
# 8  apple apple apple     j
# 9      g     d     h     g
# 10     e     i     d     f

data.table

library(data.table)
setDT(CSES)
setDT(meta)

for(n in names(CSES))
  CSES[meta, on = setNames('code', n), (n) := i.name]

CSES[]
#       V1    V2    V3    V4
#  1:     f     g     i     f
#  2:     d     f     g peach
#  3:     e  pear     h  pear
#  4:     g     f     d     i
#  5: peach     d     e     e
#  6: apple     e     h     h
#  7: apple peach     i     i
#  8: apple apple apple     j
#  9:     g     d     h     g
# 10:     e     i     d     f

答案 1 :(得分:0)

或者在基数R中,如果您将数据帧转换为因子,则可以重置级别来巧妙地执行此操作,或者可能更有效。

Name:             some-ingress
Namespace:        default
Address:          
Default backend:  default-http-backend:80 (<none>)
Rules:
  Host  Path  Backends
  ----  ----  --------
  *     
        /    ssl-redirect-default:use-annotation (<none>)
        /*   ssl-redirect:use-annotation (<none>)
        /*   some-service:80 (192.168.92.252:8080)
Annotations:
  alb.ingress.kubernetes.io/actions.ssl-redirect:          {"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}
  alb.ingress.kubernetes.io/actions.ssl-redirect-default:  {"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Path": "/someapp/#{path}", "Port": "443", "StatusCode": "HTTP_301"}}
  alb.ingress.kubernetes.io/certificate-arn:               arn:aws:acm:eu-central-1:...
  alb.ingress.kubernetes.io/listen-ports:                  [{"HTTP": 80}, {"HTTPS":443}]
  alb.ingress.kubernetes.io/scheme:                        internet-facing
  kubernetes.io/ingress.class:                             alb
Events:
  Type    Reason  Age                From                    Message
  ----    ------  ----               ----                    -------
  Normal  CREATE  16m                alb-ingress-controller  LoadBalancer some-alb created, ARN: some-alb-arn:loadbalancer/app/some-alb/some-ids
  Normal  CREATE  16m (x2 over 16m)  alb-ingress-controller  rule 1 created with conditions [{    Field: "path-pattern",    Values: ["/"]  }]
  Normal  CREATE  16m (x2 over 16m)  alb-ingress-controller  rule 2 created with conditions [{    Field: "path-pattern",    Values: ["/*"]  }]
  Normal  CREATE  16m                alb-ingress-controller  rule 3 created with conditions [{    Field: "path-pattern",    Values: ["/*"]  }]

带输出 [1]“ --- b

meta = data.frame(code=c('a','c'), name=c('A', 'C'), stringsAsFactors = F)
CSES = data.frame(code = c('a', 'b', 'c', 'd'), c2 = c(1,2,3,4), stringsAsFactors = F)
print("---before----")
print(CSES)
f <- CSES$code %in% meta$code
CSES$code[f] <- sapply(CSES$code[f], function(x) meta$name[which(x==meta$code)])
print("---after using factors and levels----")
print(CSES)

使用因素如下

efore----"
  code c2
1    a  1
2    b  2
3    c  3
4    d  4
[1] "---after using factors and levels----"
  code c2
1    A  1
2    b  2
3    C  3
4    d  4

有输出

meta2 = data.frame(code=c('a','c'), name=c('A', 'C'))
CSES2 = data.frame(code = c('a', 'b', 'c', 'd'), c2 = c(1,2,3,4))
frepl <- function(s){
  if (s %in% meta2$code){levels(meta2$name)[meta2$code == s]} else {s}
}
levels(CSES2$code) <- sapply(levels(CSES2$code), function(s) frepl(s))
print("---after using factors and levels----")
print(CSES2)