使用lapply时出错?

时间:2016-06-30 13:05:32

标签: r dataframe lapply

我想划分我的数据帧的每一列"数据"由另一个名为" benchmark"的数据帧的每一列。但是,我使用lapply并手动分割得到不同的结果。我的代码中的错误在哪里?

我使用的代码是:

   div.A.B1  div.A.B2
1  0.7200000 0.8000000
2  0.7422680 0.8163265
3  0.7346939 0.8080808
4  0.7422680 0.8333333
5  0.7578947 0.8510638
6  0.7741935 0.8695652
7  0.7826087 0.8510638
8  0.7826087 0.8602151
9  0.7912088 0.8791209
10 0.8181818 0.8791209

对于前两列,这给了我以下结果......

        A.B1      A.B2
1  0.7200000 0.7200000
2  0.8247423 0.8163265
3  0.7653061 0.7575758
4  0.7525773 0.7604167
5  0.9473684 0.9574468
6  0.8709677 0.8804348
7  0.8804348 0.8617021
8  0.9347826 0.9247312
9  1.0989011 1.0989011
10 0.9090909 0.8791209

...在划分第一列"数据"通过"基准"的前两列手动给我:

     A
1   72
2   80
3   75
4   73
5   90
6   81
7   81
8   86
9  100
10  80

"数据"的一些示例数据:

    B1  B2
1  100 100
2   97  98
3   98  99
4   97  96
5   95  94
6   93  92
7   92  94
8   92  93
9   91  91
10  88  91

和"基准":

{{1}}

5 个答案:

答案 0 :(得分:2)

您可以使用outer

data <- read.table(text = "     A1 A2 
                   1   72 11
                   2   80 20
                   3   75 15
                   4   73 17
                   5   90 13
                   6   81 18
                   7   81 22
                   8   86 30
                   9  100 20
                   10  80 22", header = TRUE)

benchmark <- read.table(text = "    B1  B2
1  100 100
                        2   97  98
                        3   98  99
                        4   97  96
                        5   95  94
                        6   93  92
                        7   92  94
                        8   92  93
                        9   91  91
                        10  88  91", header = TRUE)

res <- outer(seq_along(data), seq_along(benchmark), 
      function(i, j, DF1, DF2) DF1[,i] / DF2[, j], 
      DF1 = data, DF2 = benchmark)

names(res) <- outer(names(data), names(benchmark), paste, sep = ".")
#       A1.B1     A2.B1     A1.B2     A2.B2
#1  0.7200000 0.1100000 0.7200000 0.1100000
#2  0.8247423 0.2061856 0.8163265 0.2040816
#3  0.7653061 0.1530612 0.7575758 0.1515152
#4  0.7525773 0.1752577 0.7604167 0.1770833
#5  0.9473684 0.1368421 0.9574468 0.1382979
#6  0.8709677 0.1935484 0.8804348 0.1956522
#7  0.8804348 0.2391304 0.8617021 0.2340426
#8  0.9347826 0.3260870 0.9247312 0.3225806
#9  1.0989011 0.2197802 1.0989011 0.2197802
#10 0.9090909 0.2500000 0.8791209 0.2417582

答案 1 :(得分:2)

如何使用df1/df2,请参阅示例:

#dummy data
df1 <- mtcars[1:5, 1, drop = FALSE]
df2 <- mtcars[1:5, 4:6]

df1; df2

#                   mpg
# Mazda RX4         21.0
# Mazda RX4 Wag     21.0
# Datsun 710        22.8
# Hornet 4 Drive    21.4
# Hornet Sportabout 18.7

#                    hp drat    wt
# Mazda RX4         110 3.90 2.620
# Mazda RX4 Wag     110 3.90 2.875
# Datsun 710         93 3.85 2.320
# Hornet 4 Drive    110 3.08 3.215
# Hornet Sportabout 175 3.15 3.440


df1$mpg/df2
#                          hp     drat       wt
# Mazda RX4         0.1909091 5.384615 8.015267
# Mazda RX4 Wag     0.1909091 5.384615 7.304348
# Datsun 710        0.2451613 5.922078 9.827586
# Hornet 4 Drive    0.1945455 6.948052 6.656299
# Hornet Sportabout 0.1068571 5.936508 5.436047

答案 2 :(得分:0)

我认为您可能想尝试使用purrr,它有一些功能可以让您映射多个列表,这对这种情况很有帮助。在这种情况下,您可以使用类似的东西 map2_df(data, benchmark, ~.x / .y)

答案 3 :(得分:0)

您可以尝试:

A=data; B=benchmark
matrix(apply(A, 2, function(x, y) apply(y, 2, function(z, x) x/z, x), B), nrow(A), ncol(A)*ncol(B), byrow = F)
          [,1]      [,2]
 [1,] 0.7200000 0.7200000
 [2,] 0.8247423 0.8163265
 [3,] 0.7653061 0.7575758
 [4,] 0.7525773 0.7604167
 [5,] 0.9473684 0.9574468
 [6,] 0.8709677 0.8804348
 [7,] 0.8804348 0.8617021
 [8,] 0.9347826 0.9247312
 [9,] 1.0989011 1.0989011
[10,] 0.9090909 0.8791209

背后的想法是两个嵌套的应用函数。使用matrix()函数适当地转换结果。 或者使用Rolands数据。请注意订购时间为A1B1, A1B2, A2B1, A2B2

matrix(apply(data, 2, function(x,y) apply(y, 2, function(z,x) x/z, x), benchmark), nrow(data) , ncol(data)*ncol(benchmark), byrow = F)
           [,1]      [,2]      [,3]      [,4]
 [1,] 0.7200000 0.7200000 0.1100000 0.1100000
 [2,] 0.8247423 0.8163265 0.2061856 0.2040816
 [3,] 0.7653061 0.7575758 0.1530612 0.1515152
 [4,] 0.7525773 0.7604167 0.1752577 0.1770833
 [5,] 0.9473684 0.9574468 0.1368421 0.1382979
 [6,] 0.8709677 0.8804348 0.1935484 0.1956522
 [7,] 0.8804348 0.8617021 0.2391304 0.2340426
 [8,] 0.9347826 0.9247312 0.3260870 0.3225806
 [9,] 1.0989011 1.0989011 0.2197802 0.2197802
[10,] 0.9090909 0.8791209 0.2500000 0.2417582

或者结合zx8754的答案会给出一个可以与do.call绑定在一起的分区列表:

do.call("cbind", apply(data, 2, function(x,y) x/y, benchmark))

答案 4 :(得分:0)

以下是使用expand.grid的解决方案:

e <- do.call(expand.grid, list(1:ncol(data),1:ncol(benchmark)))

# e will give you all possible permutations of columns on which you can apply division
  # Var1 Var2
# 1    1    1
# 2    2    1
# 3    1    2
# 4    2    2

r <- apply(e, 1, function(x) data[,x[1]]/benchmark[,x[2]])

# to make descriptive column names for r
colnames(r) <- apply(expand.grid(names(data), names(benchmark)), 1, paste, collapse="/")

         # A1/B1     A2/B1     A1/B2     A2/B2
 # [1,] 0.7200000 0.1100000 0.7200000 0.1100000
 # [2,] 0.8247423 0.2061856 0.8163265 0.2040816
 # [3,] 0.7653061 0.1530612 0.7575758 0.1515152
 # [4,] 0.7525773 0.1752577 0.7604167 0.1770833
 # [5,] 0.9473684 0.1368421 0.9574468 0.1382979
 # [6,] 0.8709677 0.1935484 0.8804348 0.1956522
 # [7,] 0.8804348 0.2391304 0.8617021 0.2340426
 # [8,] 0.9347826 0.3260870 0.9247312 0.3225806
 # [9,] 1.0989011 0.2197802 1.0989011 0.2197802
# [10,] 0.9090909 0.2500000 0.8791209 0.2417582

数据

data <- structure(list(A1 = c(72L, 80L, 75L, 73L, 90L, 81L, 81L, 86L, 
100L, 80L), A2 = c(11L, 20L, 15L, 17L, 13L, 18L, 22L, 30L, 20L, 
22L)), .Names = c("A1", "A2"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

benchmark <- structure(list(B1 = c(100L, 97L, 98L, 97L, 95L, 93L, 92L, 92L, 
91L, 88L), B2 = c(100L, 98L, 99L, 96L, 94L, 92L, 94L, 93L, 91L, 
91L)), .Names = c("B1", "B2"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))