Question

我在小标题中有一组字符串变量，我想根据它们的字符串内容将其重新编码为特定的整数。我的代码如下：

library(tidyverse)

a<-c("this string says apple", "this string says banana", "this string says carrot", "this string says apple")
b<- c("this string says pumpkin", "this string says radish", "this string says eggplant", "this string says radish")
produce <- tibble(a,b)

a_words <- c("apple", "banana", "carrot")
b_words <- c("pumpkin", "radish", "eggplant")

my_function<-function(var,word_vec,num_vec){ 
  for (i in seq_along(word_vec)){
    var[grepl(word_vec[[i]],var)]<-num_vec[[i]]
  }
  return(var)
}

分别处理每个变量时，我可以获得期望的结果：

produce$a <- my_function(produce$a,a_words,1:3)
produce$b <- my_function(produce$b,b_words,1:3)

> produce
# A tibble: 4 x 2
  a     b    
  <chr> <chr>
1 1     1    
2 2     2    
3 3     3    
4 1     2

但是实际上，我有几个要重新编码的变量（但不是所有的变量都是小标题）。我已经尝试过循环功能：

for (i in c("produce$a", "produce$b")){
  i <- my_function(i, paste0(str_replace(i,"produce$", ""),"_words"), 1:3)
}

但这不会改变农产品的标度。

任何有关如何更有效地执行此操作的建议，将不胜感激。

Answer 1

怎么样呢？

words <- list(
    a = c("apple", "banana", "carrot"),
    b = c("pumpkin", "radish", "eggplant"))

produce %>%
    rowid_to_column("row") %>%
    gather(key, val, -row) %>%
    rowwise() %>%
    mutate(val = map_int(words[key], ~which(str_detect(val, .x) == TRUE))) %>%
    spread(key, val) %>%
    select(-row)
## A tibble: 4 x 2
#      a     b
#  <int> <int>
#1     1     1
#2     2     2
#3     3     3
#4     1     2

这里的关键是

将words存储在list中，其名称与produce中的列名称相匹配，
将produce从宽转换为长，然后
regexp-匹配名称为key的列中的条目与words中的匹配项。

Answer 2

您做得不错，但有些困惑。

首先，您对字符串和变量“ produce $ a”与Produce $ a的区别做了一个说明。函数get()是从第一个获得第二个的功能。

此外，函数paste0()对作为参数传递的字符串的某些符号进行一些评估。 “ $”就是其中之一。

最后，您必须学习对数据框的列使用除 df $ X 以外的其他访问器，例如 df [[“ X”]] 或 df [“ X”] 。他们有自己的行为方式，您应该了解一下。

无论如何，以下是-我想-您正在寻找的代码。

for (i in c("a", "b")){
  print(my_function(produce[[i]], get(paste0(i,"_words")), 1:3))
}

NB：我将循环列表更改为c（“ a”，“ b”），因为它更易于理解，但您也可以循环遍历c("produce$a", "produce$b")并获得一些解决方案更改。

在函数中更改小标题的元素

2 个答案: