Question

我有一个这样的数据框：

Hours                 work_place         overtime   
More than 48 hours    Farm          Overtime paid                 
Less than 48 horas    Factory       Overtime paid 
More than 48 hours    Office        Overtime paid                
Less than 48 horas    Farm          Overtime not paid 
More than 48 hours    Factory       Overtime paid                
Less than 48 horas    Office        Overtime paid

在一个单独的过程中，我创建了几个对象。进程$ object1如下所示：

过程$ object1

                        Dim1   Dim2   Dim3
 More than 48 hours       0.05  0.33  0.96
 Less than 48 horas      -0.02 -0.16 -0.47
 Farm                     0.14  1.51  0.29
 Factory                 -0.13  0.15  1.03
 Office                   0.01  2.05 -0.47
 Home                     0.00 -0.19 -0.14
 Overtime paid            0.03  0.04 -0.09
 Overtime not paid       -0.26 -0.32  0.76

我想用原始数据框中的值替换进程$ object1的第1列（Dim1）中的值，所以我最终得到了这个......

  Hours2    work_place2  overtime2   
  0.05     0.14          0.03                 
 -0.02     -0.13         0.03
  0.05     0.01          0.03                
 -0.02     0.14         -0.26 
  0.05     -0.13         0.03                
 -0.02    0.01          0.03

由于原始数据帧非常大，我想在R中使用某种函数来做到这一点。任何帮助都非常感谢。

dput()格式的

数据。

dat <-
structure(list(Hours = c("More than 48 hours", "Less than 48 horas", 
"More than 48 hours", "Less than 48 horas", "More than 48 hours", 
"Less than 48 horas"), work_place = c("Farm", "Factory", "Office", 
"Farm", "Factory", "Office"), overtime = c("Overtime paid", "Overtime paid", 
"Overtime paid", "Overtime not paid", "Overtime paid", "Overtime paid"
)), row.names = c(NA, -6L), class = "data.frame")

process <-
list(object1 = structure(list(Dim1 = c(0.05, -0.02, 0.14, -0.13, 
0.01, 0, 0.03, -0.26), Dim2 = c(0.33, -0.16, 1.51, 0.15, 2.05, 
-0.19, 0.04, -0.32), Dim3 = c(0.96, -0.47, 0.29, 1.03, -0.47, 
-0.14, -0.09, 0.76)), class = "data.frame", row.names = c("More than 48 hours", 
"Less than 48 horas", "Farm", "Factory", "Office", "Home", "Overtime paid", 
"Overtime not paid")))

result <-
structure(list(Hours2 = c(0.05, -0.02, 0.05, -0.02, 0.05, -0.02
), work_place2 = c(0.14, -0.13, 0.01, 0.14, -0.13, 0.01), overtime2 = c(0.03, 
0.03, 0.03, -0.26, 0.03, 0.03)), class = "data.frame", row.names = c(NA, 
-6L))

Answer 1

dat[c("Hours2", "work_place2", "overtime2")] <- lapply(
  X   = dat[c("Hours", "work_place", "overtime")],
  FUN = function(x) process[["object1"]][x, "Dim1"]
)

因为data.frame只是一个花哨的列表，您可以使用新的向量列表分配新值。由于process$object1具有名称，因此您可以在lapply中使用命名子集来执行此操作。

Answer 2

以下将做你想要的。请注意，结果第一列的名称为Hours而不是Hours2。

fun <- function(x, DF, col){
    rn <- row.names(DF)
    inx <- match(x, rn)
    DF[inx, col]
}

res <- lapply(dat, fun, process$object1, 1)
res <- do.call(cbind.data.frame, res)
res
#  Hours work_place overtime
#1  0.05       0.14     0.03
#2 -0.02      -0.13     0.03
#3  0.05       0.01     0.03
#4 -0.02       0.14    -0.26
#5  0.05      -0.13     0.03
#6 -0.02       0.01     0.03

上述功能可以是单行：

fun <- function(x, DF, col) DF[match(x, row.names(DF)), col]

但我发现多行版本更具可读性。

使用对象中提取的内容替换数据框中的值的有效方法？

2 个答案: