Question

我有一个非常难看的数据集，它是关系数据库的平面文件。这里是一个可重复性最小的例子：

df <- data.frame(col1 = c(letters[1:4],"c"), 
                  col1.p = 1:5, 
                  col2 = c("a","c","l","c","l"), 
                 col2.p = 6:10,
                  col3= letters[3:7],
                 col3.p = 11:20)

我需要能够确定＆＃39; .p＆＃39; ＆＃39; col＃＆＃39;的价值有＆＃34; c＆＃34;。我之前关于SO的问题得到了第一部分：In R, find the column that contains a string in for each row。我提供的是上下文。

tmp <- which(projectdata=='Transmission and Distribution of Electricity', arr.ind=TRUE)
cnt <- ave(tmp[,"row"], tmp[,"row"], FUN=seq_along)
maxnames <- paste0("max",sequence(max(cnt)))
projectdata[maxnames] <- NA
projectdata[maxnames][cbind(tmp[,"row"],cnt)] <- names(projectdata)[tmp[,"col"]]
rm(tmp, cnt, maxnames)

这会产生如下所示的数据框：

df
   col1 col1.p col2 col2.p col3 col3.p max1
1     a      1    a      6    c     11 col3
2     b      2    c      7    d     12 col2
3     c      3    l      8    e     13 col1
4     d      4    c      9    f     14 col2
5     c      5    l     10    g     15 col1
6     a      1    a      6    c     16 col3
7     b      2    c      7    d     17 col2
8     c      3    l      8    e     18 col1
9     d      4    c      9    f     19 col2
10    c      5    l     10    g     20 col1

当我试图获得＆＃34; .p＆＃34;匹配＆＃34; max1＆＃34;中的值，我一直收到错误。我认为方法是：

df %>%
   mutate(my.p = eval(as.name(paste0(max1,'.p'))))
Error: object 'col3.p' not found

显然，这不起作用，所以我想也许这类似于在函数中传递列名，我需要使用＆＃39; get＆＃39;。这也没有用。

df %>%
   mutate(my.p = get(as.name(paste0(max1,'.p'))))
Error: invalid first argument
df %>%
   mutate(my.p = get(paste0(max1,'.p')))
Error: object 'col3.p' not found

我发现了一些消除此错误的内容，使用data.table来自另一个但相关的问题：http://codereply.com/answer/7y2ra3/dplyr-error-object-found-using-rle-mutate.html。但是，它给了我＆＃34; col3.p＆＃34;对于每一行。这是第一行的最大值df$max1[1]

library('dplyr')
library('data.table') # must have the data.table package
df %>%
  tbl_dt(df) %>% 
  mutate(my.p = get(paste0(max1,'.p')))

Source: local data table [10 x 8]

   col1 col1.p col2 col2.p col3 col3.p max1 my.p
1     a      1    a      6    c     11 col3   11
2     b      2    c      7    d     12 col2   12
3     c      3    l      8    e     13 col1   13
4     d      4    c      9    f     14 col2   14
5     c      5    l     10    g     15 col1   15
6     a      1    a      6    c     16 col3   16
7     b      2    c      7    d     17 col2   17
8     c      3    l      8    e     18 col1   18
9     d      4    c      9    f     19 col2   19
10    c      5    l     10    g     20 col1   20

使用lazyeval interp方法（来自此SO：Hot to pass dynamic column names in dplyr into custom function?）并不适合我。也许我错误地实施了它？

library(lazyeval)
library(dplyr)
df %>%
  mutate_(my.p = interp(~colp, colp = as.name(paste0(max1,'.p'))))

我收到错误：

Error in paste0(max1, ".p") : object 'max1' not found

理想情况下，我会根据my.p中标识的列，将新列p与相应的max1相等。

我可以使用ifelse执行此操作，但我尝试使用较少的代码执行此操作，并使其适用于下一个丑陋的平面表。

Answer 1

我们可以使用data.table执行此操作。我们将'data.frame'转换为'data.table'（setDT(df)），按行序列分组，我们get输出paste的值，并分配（{ {1}}）它到一个新列（'my.p'）。

:=

将变量作为列名传递给dplyr？

1 个答案: