计算两个数据集中一行中所有值组合之间的比率

时间:2017-01-25 13:44:00

标签: r

正如标题中提到的,我有两个数据集:

数据1:

structure(list(Name = structure(c(18L, 19L, 5L, 13L, 14L, 31L
), .Label = c("AMC Javelin", "Cadillac Fleetwood", "Camaro Z28", 
"Chrysler Imperial", "Datsun 710", "Dodge Challenger", "Duster 360", 
"Ferrari Dino", "Fiat 128", "Fiat X1-9", "Ford Pantera L", "Honda Civic", 
"Hornet 4 Drive", "Hornet Sportabout", "Lincoln Continental", 
"Lotus Europa", "Maserati Bora", "Mazda RX4", "Mazda RX4 Wag", 
"Merc 230", "Merc 240D", "Merc 280", "Merc 280C", "Merc 450SE", 
"Merc 450SL", "Merc 450SLC", "Pontiac Firebird", "Porsche 914-2", 
"Toyota Corolla", "Toyota Corona", "Valiant", "Volvo 142E"), class = "factor"), 
    mpg = c(145, 120, 150, 132, 110, 98), cyl = c(93, 116, 114, 
    156, 148, 167), disp = c(160, 160, 108, 258, 360, 225), hp = c(110, 
    110, 93, 110, 175, 105)), .Names = c("Name", "mpg", "cyl", 
"disp", "hp"), row.names = c(NA, 6L), class = "data.frame")

数据2:

structure(list(Name = structure(c(18L, 19L, 5L, 13L, 14L, 31L
), .Label = c("AMC Javelin", "Cadillac Fleetwood", "Camaro Z28", 
"Chrysler Imperial", "Datsun 710", "Dodge Challenger", "Duster 360", 
"Ferrari Dino", "Fiat 128", "Fiat X1-9", "Ford Pantera L", "Honda Civic", 
"Hornet 4 Drive", "Hornet Sportabout", "Lincoln Continental", 
"Lotus Europa", "Maserati Bora", "Mazda RX4", "Mazda RX4 Wag", 
"Merc 230", "Merc 240D", "Merc 280", "Merc 280C", "Merc 450SE", 
"Merc 450SL", "Merc 450SLC", "Pontiac Firebird", "Porsche 914-2", 
"Toyota Corolla", "Toyota Corona", "Valiant", "Volvo 142E"), class = "factor"), 
    mpg_1 = c(125, 133, 143, 141, 134, 238), cyl_1 = c(114, 153, 
    112, 136, 128, 155), disp_1 = c(113, 143, 144, 131, 431, 
    331), hp_1 = c(332, 221, 113, 331, 134, 151)), .Names = c("Name", 
"mpg_1", "cyl_1", "disp_1", "hp_1"), row.names = c(NA, 6L), class = "data.frame")

我想计算数据集中相应行之间的比率。行(4列)中的所有值都应该用于比率计算,并且应该在数据集之间计算比率。使用更简单的解释:

data1[1,2] / data2[1,2]
data1[1,2] / data2[1,3]
...
data1[1,3] / data2[1,2]
...

我希望将结果存储在单独的数据中,并使用易于标记的列来识别比率的计算方式。

2 个答案:

答案 0 :(得分:4)

以下是使用expand.gridrepmapply的方法。首先,我们使用expand.grid生成我们希望迭代的列的所有组合。然后,我们使用rep生成我们希望迭代的行。然后,我们将这两个值存储在data.frame中。使用mapply函数,我们遍历dat_iter的每一行,指定我们感兴趣的列和行索引。

cols <- expand.grid(2:5, 2:5)
rows <- rep(1:6, each = 16)
dat_iter <- data.frame(rows, cols)
res <- t(mapply(x = dat_iter$rows, y = dat_iter$Var1, z = dat_iter$Var2,
       FUN = function(x, y, z) c('ratio' = data1[x, y] / data2[x, z],
                                 'd1_name' = names(data1)[y],
                                 'd2_name' = names(data2)[z],
                                 'row' = x)))

res[1:5,]

     ratio             d1_name d2_name row
[1,] "1.16"            "mpg"   "mpg_1" "1"
[2,] "0.744"           "cyl"   "mpg_1" "1"
[3,] "1.28"            "disp"  "mpg_1" "1"
[4,] "0.88"            "hp"    "mpg_1" "1"
[5,] "1.2719298245614" "mpg"   "cyl_1" "1"

由于我们使用了mapply,您必须将第一列转换为数字。

答案 1 :(得分:4)

使用lapply,您可以执行以下操作。使用rbind,您可以获得格式较长且格式为cbind的结果。

长格式:

ratioLongDF = do.call(rbind,lapply(1:ncol(DF2[,-1]),function(x) 
         data.frame(DF1[,-1]/DF2[,-1][,x],divisor=colnames(DF2[,-1])[x] ) ) )


ratioLongDF
#         mpg       cyl      disp        hp divisor
#1  1.1600000 0.7440000 1.2800000 0.8800000   mpg_1
#2  0.9022556 0.8721805 1.2030075 0.8270677   mpg_1
#3  1.0489510 0.7972028 0.7552448 0.6503497   mpg_1
#4  0.9361702 1.1063830 1.8297872 0.7801418   mpg_1
#5  0.8208955 1.1044776 2.6865672 1.3059701   mpg_1
#6  0.4117647 0.7016807 0.9453782 0.4411765   mpg_1
#7  1.2719298 0.8157895 1.4035088 0.9649123   cyl_1
#8  0.7843137 0.7581699 1.0457516 0.7189542   cyl_1
#9  1.3392857 1.0178571 0.9642857 0.8303571   cyl_1
#10 0.9705882 1.1470588 1.8970588 0.8088235   cyl_1
#11 0.8593750 1.1562500 2.8125000 1.3671875   cyl_1
#12 0.6322581 1.0774194 1.4516129 0.6774194   cyl_1
#13 1.2831858 0.8230088 1.4159292 0.9734513  disp_1
#14 0.8391608 0.8111888 1.1188811 0.7692308  disp_1
#15 1.0416667 0.7916667 0.7500000 0.6458333  disp_1
#16 1.0076336 1.1908397 1.9694656 0.8396947  disp_1
#17 0.2552204 0.3433875 0.8352668 0.4060325  disp_1
#18 0.2960725 0.5045317 0.6797583 0.3172205  disp_1
#19 0.4367470 0.2801205 0.4819277 0.3313253    hp_1
#20 0.5429864 0.5248869 0.7239819 0.4977376    hp_1
#21 1.3274336 1.0088496 0.9557522 0.8230088    hp_1
#22 0.3987915 0.4712991 0.7794562 0.3323263    hp_1
#23 0.8208955 1.1044776 2.6865672 1.3059701    hp_1
#24 0.6490066 1.1059603 1.4900662 0.6953642    hp_1 

广泛格式:

ratioWideDF = do.call(cbind,lapply(1:ncol(DF2[,-1]),function(x) {
              DF = data.frame(DF1[,-1]/DF2[,-1][,x] );
              colnames(DF)=paste0(colnames(DF),"_",colnames(DF2[,-1])[x]); 
              return(DF)} ) )   

ratioWideDF[,1:8]
#  mpg_mpg_1 cyl_mpg_1 disp_mpg_1  hp_mpg_1 mpg_cyl_1 cyl_cyl_1 disp_cyl_1  hp_cyl_1
#1 1.1600000 0.7440000  1.2800000 0.8800000 1.2719298 0.8157895  1.4035088 0.9649123
#2 0.9022556 0.8721805  1.2030075 0.8270677 0.7843137 0.7581699  1.0457516 0.7189542
#3 1.0489510 0.7972028  0.7552448 0.6503497 1.3392857 1.0178571  0.9642857 0.8303571
#4 0.9361702 1.1063830  1.8297872 0.7801418 0.9705882 1.1470588  1.8970588 0.8088235
#5 0.8208955 1.1044776  2.6865672 1.3059701 0.8593750 1.1562500  2.8125000 1.3671875
#6 0.4117647 0.7016807  0.9453782 0.4411765 0.6322581 1.0774194  1.4516129 0.6774194