通过将第一行减去多列中的每个值来规范化数据

时间:2020-10-02 22:37:24

标签: r dplyr

我试图通过应用此论坛上提出的几种解决方案来解决我的问题,但是我没有工作。

基本上,我有一个数据框:

 Concentration Salinity Light.Dark Distance Velocity In_Center Freezing Cruising Bursting Clockwise CounterClockwise
  <ord>         <fct>    <fct>         <dbl>    <dbl>     <dbl>    <dbl>    <dbl>    <dbl>     <dbl>            <dbl>
1 V             0.5      Dark         0.0612   0.0826   0.0638   0.0207    0.0124  0.00511   -0.0866         -0.0439 
2 L             0.5      Dark         0.0360   0.0282  -0.166   -0.00475   0.148  -0.0328    -0.0337          0.0615 
3 M             0.5      Dark        -0.144   -0.147    0.00761  0.0405   -0.191  -0.00586    0.0772         -0.0123 
4 H             0.5      Dark         0.0464   0.0362   0.0949  -0.0565    0.0306  0.0335     0.0431         -0.00527
> 

我想通过从每一行中减去第一行来规范从Distance到CounterClockwise的列。

我尝试过:

df_norm= df %>% 
  mutate_at(4:11, list(~ .- first(.)))

但它仅返回0:

 Concentration Salinity Light.Dark Distance Velocity In_Center Freezing Cruising Bursting Clockwise CounterClockwise
  <ord>         <fct>    <fct>         <dbl>    <dbl>     <dbl>    <dbl>    <dbl>    <dbl>     <dbl>            <dbl>
1 V             0.5      Dark              0        0         0        0        0        0         0                0
2 L             0.5      Dark              0        0         0        0        0        0         0                0
3 M             0.5      Dark              0        0         0        0        0        0         0                0
4 H             0.5      Dark              0        0         0        0        0        0         0                0

我尝试使用以下方式将小标题转换为数据框:

as.data.frame(df_norm)

但是我得到了

 Concentration Salinity Light.Dark Distance Velocity In_Center Freezing Cruising Bursting Clockwise CounterClockwise
1             V      0.5       Dark        0        0         0        0        0        0         0                0
2             L      0.5       Dark        0        0         0        0        0        0         0                0
3             M      0.5       Dark        0        0         0        0        0        0         0                0
4             H      0.5       Dark        0        0         0        0        0        0         0                0

这是我df的报告:

structure(list(Concentration = structure(1:4, .Label = c("V", 
"L", "M", "H"), class = c("ordered", "factor")), Salinity = structure(c(1L, 
1L, 1L, 1L), .Label = c("0.5", "2", "6"), class = "factor"), 
    Light.Dark = structure(c(1L, 1L, 1L, 1L), .Label = c("Dark", 
    "ERROR", "Light"), class = "factor"), Distance = c(0.0611762417792624, 
    0.0359847599237893, -0.143596409795565, 0.0464354080925131
    ), Velocity = c(0.0825514600369596, 0.0282499048624341, -0.146998610001507, 
    0.0361972451021132), In_Center = c(0.06383139350079, -0.166302972291672, 
    0.00760502103176895, 0.0948665577591132), Freezing = c(0.0206958889309448, 
    -0.00474520212061713, 0.0405259034871347, -0.0564765902974621
    ), Cruising = c(0.0123684826368456, 0.148343102625951, -0.191335919657439, 
    0.0306243343946422), Bursting = c(0.00511229994076513, -0.0327935337663713, 
    -0.00586044139551122, 0.0335416752211175), Clockwise = c(-0.0865980448950217, 
    -0.0337007169788508, 0.077213035103443, 0.0430857267704295
    ), CounterClockwise = c(-0.0439324933628217, 0.0615054907079504, 
    -0.0123010981415901, -0.00527189920353861)), row.names = c(NA, 
-4L), groups = structure(list(Concentration = structure(1:4, .Label = c("V", 
"L", "M", "H"), class = c("ordered", "factor")), Salinity = structure(c(1L, 
1L, 1L, 1L), .Label = c("0.5", "2", "6"), class = "factor"), 
    .rows = structure(list(1L, 2L, 3L, 4L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, 4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

有什么想法我做错了吗?

非常感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

基于dput,它是一个分组数据集,每组只有一行

df %>% 
    summarise(n = n())
# A tibble: 4 x 3
# Groups:   Concentration [4]
#  Concentration Salinity     n
#  <ord>         <fct>    <int>
#1 V             0.5          1
#2 L             0.5          1
#3 M             0.5          1
#4 H             0.5          1

所以,基本上,它是减去相同的值。

如果要对整个数据集执行此操作,请ungroup,然后应用代码

df %>%
     ungroup %>%
     mutate(across(4:11, ~ . - first(.)))
     # // to get the difference on numeric columns
     #  mutate(across(where(is.numeric), ~ . - first(.)))

# A tibble: 4 x 11
#  Concentration Salinity Light.Dark Distance Velocity In_Center Freezing Cruising Bursting Clockwise CounterClockwise
#  <ord>         <fct>    <fct>         <dbl>    <dbl>     <dbl>    <dbl>    <dbl>    <dbl>     <dbl>            <dbl>
#1 V             0.5      Dark         0        0         0        0        0        0         0                0     
#2 L             0.5      Dark        -0.0252  -0.0543   -0.230   -0.0254   0.136   -0.0379    0.0529           0.105 
#3 M             0.5      Dark        -0.205   -0.230    -0.0562   0.0198  -0.204   -0.0110    0.164            0.0316
#4 H             0.5      Dark        -0.0147  -0.0464    0.0310  -0.0772   0.0183   0.0284    0.130            0.0387