在R中按组自定义归一化

时间:2019-04-21 18:21:15

标签: r normalization transformation

我有一个看起来像这样的数据框:

public int saleId { get; set; }
    public Nullable<int> customerId { get; set; }
    public Nullable<System.DateTime> saleDate { get; set; }
    public Nullable<int> invoiceId { get; set; }

    public Nullable<decimal> total{ get; set; }

    public virtual customerTable customerTable { get; set; }
    public virtual fac fac { get; set; }
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Usage", "CA2227:CollectionPropertiesShouldBeReadOnly")]
    public virtual ICollection<Sale_Detail> Sale_Detail { get; set; }

我想通过以下方式对我的列进行规范化。对于 function recherche_salle ($conn,$datte,$begin,$end) { $salle_libre= array(); $position=0; $str=''; $sql= "select * from time_report_ where new_date = '".$datte."' and ('".$begin."' BETWEEN debut AND fin or '".$end."' BETWEEN debut AND fin) UNION select * from events where new_date = '".$datte."' and ('".$begin."' BETWEEN debut AND fin or '".$end."' BETWEEN debut AND fin);"; $results = mysqli_query($conn,$sql); $resultscheck = mysqli_num_rows ($results); if ($resultscheck>0) { while ($row= mysqli_fetch_assoc($results)) { array_splice( $salle_libre, $position, 0,$row['Salle']); $position= $position+1; } } $image= implode("','", $salle_libre); //echo $image; $sqll= "SELECT DISTINCT Salle FROM liste_des_salles WHERE Salle NOT in ('$image') limit 1;"; $resultss = mysqli_query($conn,$sqll); $resultscheckk = mysqli_num_rows ($resultss); if ($resultscheckk>0) { echo ("voici la liste des salles libres" ."<br>"); while ($roww= mysqli_fetch_assoc($resultss)) { $str ='<br/>'.$roww['Salle'].'<input type="checkbox" value='.$roww['Salle'].' name="check[]" >'; } } return $str; } <script type="text/javascript"> function open_script() { var items=document.getElementByName('check[]'); var selected=""; for (var i=0; i<items.length;i++) { if (items[i].type=='checkbox' && items[i].checked==true) selected+=items[i].value+"\n"; } alert(selected); } <button onclick="open_script()" id ="submit-form">submit </button> </script> 的每种组合,我想将group1<-c(rep(1,12)) group2<-c(rep('Low',6), rep('High',6)) var <-c(1:6,1:6) var1 <-c(2:13) var2 <-c(20:31) df1<-data.frame(group1,group2,var,var1,var2) group1<-c(rep(2,12)) group2<-c(rep('Low',6), rep('High',6)) var <-c(1:6,1:6) var1 <-c(2:13) var2 <-c(20:31) df2<-data.frame(group1,group2,var,var1,var2) df<-rbind(df1,df2) group1 group2 var var1 var2 1 1 Low 1 2 20 2 1 Low 2 3 21 3 1 Low 3 4 22 4 1 Low 4 5 23 5 1 Low 5 6 24 6 1 Low 6 7 25 7 1 High 1 8 26 8 1 High 2 9 27 9 1 High 3 10 28 10 1 High 4 11 29 11 1 High 5 12 30 12 1 High 6 13 31 13 2 Low 1 2 20 14 2 Low 2 3 21 15 2 Low 3 4 22 16 2 Low 4 5 23 17 2 Low 5 6 24 18 2 Low 6 7 25 19 2 High 1 8 26 20 2 High 2 9 27 21 2 High 3 10 28 22 2 High 4 11 29 23 2 High 5 12 30 24 2 High 6 13 31 group1列除以它们的第一个元素。这使我能够跨感兴趣的列构建一个共同的比例/指数。例如,查看group2var1的组合,应将var1和{{的组合}分别转换为group1=1的相关元素1}}应该是group2=low,依此类推。

我想对var12/2,3/2,4/2,5/2,6/2,7/2进行上述转换。预期的输出应如下所示:

group1=1

注意:数字可能是任何东西,通常是正实数,并且由于我的数据框很大,因此无法预先知道我想除以进行转换的元素是什么。

2 个答案:

答案 0 :(得分:1)

按“ group1”,“ group2”分组后,使用mutate_at对所选列进行除以该列的first

library(dplyr)
df %>%
   group_by(group1, group2) %>% 
   mutate_at(vars(var1, var2), list(tra = ~ ./first(.)))
# A tibble: 24 x 7
# Groups:   group1, group2 [4]
#   group1 group2   var  var1  var2 var1_tra var2_tra
#    <dbl> <fct>  <int> <int> <int>    <dbl>    <dbl>
# 1      1 Low        1     2    20     1        1   
# 2      1 Low        2     3    21     1.5      1.05
# 3      1 Low        3     4    22     2        1.1 
# 4      1 Low        4     5    23     2.5      1.15
# 5      1 Low        5     6    24     3        1.2 
# 6      1 Low        6     7    25     3.5      1.25
# 7      1 High       1     8    26     1        1   
# 8      1 High       2     9    27     1.12     1.04
# 9      1 High       3    10    28     1.25     1.08
#10      1 High       4    11    29     1.38     1.12
# … with 14 more rows

或使用data.table

nm1 <- c("var1", "var2")
nm2 <- paste0(nm1, "_tra")
library(data.table)
setDT(df)[, (nm2) := lapply(.SD, function(x) x/first(x)), 
              by = .(group1, group2), .SDcols = nm1]

答案 1 :(得分:0)

您也可以从sqldf开始使用,如下所示:

result <- sqldf('select df.*, (df.var1 + 0.0) / scale.s_var1 as var1_tra, (df.var2 + 0.0) / scale.s_var2 as var2_tra
          from df join 
                  (select group1, group2, min(var1) as s_var1, min(var2) as s_var2 
                   from df
                   group by group1, group2) as scale 
                 on df.group1 = scale.group1 AND df.group2 = scale.group2 
          ')

在上面的代码中,我们首先使用以下查询找到每个组的var1var2的最小值:

select group1, group2, min(var1) as s_var1, min(var2) as s_var2 
from df
group by group1, group2

将其用作嵌套查询,并在相等于dfgroup1的值上与原始数据帧group2连接。