将二进制字符串转换为R中的文本(salesforce ID转换15到18)

时间:2017-08-26 01:15:31

标签: r dplyr salesforce

希望在R中创建一个15位数的salesforce ID到18位的转换。公式写在这里:https://salesforce.stackexchange.com/questions/27686/how-can-i-convert-a-15-char-id-value-into-an-18-char-id-value 但这是在C#中,我想在R中这样做。

我已经在R中做了一个笨重的公式,它可以用于15位输入并成功返回18位数字。我想知道如何通过dplyr将其应用于data.frame中的列。

可重现的代码:

SFID_Convert <- function(fifteen_digit) {
  if (length(fifteen_digit == 15)) {

    # binary map ----
    binary <-
      c(
        "00000",
        "00001",
        "00010",
        "00011",
        "00100",
        "00101",
        "00110",
        "00111",
        "01000",
        "01001",
        "01010",
        "01011",
        "01100",
        "01101",
        "01110",
        "01111",
        "10000",
        "10001",
        "10010",
        "10011",
        "10100",
        "10101",
        "10110",
        "10111",
        "11000",
        "11001",
        "11010",
        "11011",
        "11100",
        "11101",
        "11110",
        "11111"
      )
    letter <- c(LETTERS, 0:5)
    binarymap <- data_frame(binary, letter)


    # sfid ----
    sfid <- substr(fifteen_digit, 1, 15)
    s1 <- substr(sfid, 1, 5)
    s2 <- substr(sfid, 6, 10)
    s3 <- substr(sfid, 11, 15)

    convertID <- function(str_frag) {
      str_frag <- paste(rev(strsplit(str_frag, NULL)[[1]]), collapse = '')
      str_frag <- strsplit(str_frag, NULL)[[1]]
      str_frag[which(unlist(gregexpr("[0-9]", str_frag)) == 1)] <- 0
      str_frag[which(unlist(gregexpr("[a-z]", str_frag)) == 1)] <- 0
      str_frag[which(unlist(gregexpr("[A-Z]", str_frag)) == 1)] <- 1
      str_frag <<- paste(str_frag, collapse = '')
    }

    convertID(s1)
    n1 <- str_frag
    convertID(s2)
    n2 <- str_frag
    convertID(s3)
    n3 <- str_frag

    binary <- data_frame(c(n1, n2, n3)) %>%
      select(binary = 1) %>%
      left_join(binarymap)

    return(paste(sfid, paste(binary$letter[1:3], collapse = ''), sep = ''))}
}

示例:

sfid <- "001a003920aSDuh"
SFID_Convert(sfid)
[1] "001a003920aSDuhAAG"

这就是我想要的,但是当你把它应用到df ......

col <- c("001a003920aSDuh", "001a08h010JNkJd")
name <- c("compA", "compB")
df <- data_frame(name, col)

它为第一个正确计算了“AAG”,并将其应用于每一行。我可以lapply它,但如果我有10万行的df,我认为这是错误的方法。

任何帮助表示赞赏!还在这里学习。 :)

1 个答案:

答案 0 :(得分:2)

您的代码存在各种问题。我在下面提供了一个可能的解决方案,它应该更有效:

1:定义二进制字符串&amp;之间的映射字母。您可以在功能之外执行此操作。只需定义一次,包含所有必要的转换,&amp;在函数中使用它。

binary <- c("00000","00001","00010","00011","00100","00101","00110","00111",
            "01000","01001","01010","01011","01100","01101","01110","01111",
            "10000","10001","10010","10011","10100","10101","10110","10111",
            "11000","11001","11010","11011","11100","11101","11110","11111")
binary.reverse <- lapply(binary, function(x){paste0(rev(strsplit(x, split = "")[[1]]), collapse = "")})
binary2letter <- c(LETTERS, 0:5)
names(binary2letter) <- unlist(binary.reverse)
rm(binary, binary.reverse)

我也在这一步中颠倒了二进制字符串,因此我不必为所有ID重复这些操作。结果保存在命名向量而不是数据框中。

2:以接受矢量作为输入的方式创建函数。请注意,要检查字符串是否包含X个字符,您应该使用nchar()而不是length() 。后者返回字符串的数量,而不是字符串中的字符数。

SFID_Convert <- function(sfid) {
  sfid <- as.character(sfid) # in case the input column are factors

  str_num <- gsub("[A-Z]", "1", gsub("[a-z0-9]", "0", sfid))

  s1 <- substring(str_num, 1, 5)
  s2 <- substring(str_num, 6, 10)
  s3 <- substring(str_num, 11, 15)

  sfid.addon <- paste0(sfid,
                       binary2letter[s1], 
                       binary2letter[s2], 
                       binary2letter[s3])

  sfid[nchar(sfid)==15] <- sfid.addon[nchar(sfid)==15]

  return(sfid)
}

检查解决方案:

sfid <- "001a003920aSDuh"
col <- c("001a003920aSDuh", "001a08h010JNkJd")
name <- c("compA", "compB")
df <- data_frame(name, col)

> SFID_Convert(sfid)
[1] "001a003920aSDuhAAG"

> df %>% mutate(new.col = SFID_Convert(col))
# A tibble: 2 x 3
   name             col            new.col
  <chr>           <chr>              <chr>
1 compA 001a003920aSDuh 001a003920aSDuhAAG
2 compB 001a08h010JNkJd 001a08h010JNkJdAAL