如何使用dplyr向r中的所有数字列添加整数

时间:2019-04-11 19:02:12

标签: r dplyr

我有一个数据框,其中有一个ID列和多个包含密度测量值的数字列。为了使密度呈正态分布,我需要取对数,但是因为我有0个密度值,所以我需要对所有密度测量值增加0.5,以便在对数变换时不获取Inf数据点。我该如何使用dplyr?

样本数据:

  ID    `Image Tag` `CD3 Global Den… `CD8 Global Den… `CD20 Global De… `CD3 Tumour Den… `CD8 Tumour Den…
  <chr>       <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
1 IM_10          NA           608.              755.            51.0             868.             1066. 
2 IM_1…          NA            27.5              69.3            0.550            30.4              75.2
3 IM_1…          NA            19.6              17.0            1.03             53.2              42.0
4 IM_1…          NA           109.               89.0           47.7             725.              594. 
5 IM_1…          NA           219.              171.             0.501           531.              416. 
6 IM_1…          NA             4.00              0              0                 5.94              0  

我尝试使用

df1 <- df %>% group_by(ID) %>% 
  summarise_all(funs(mean(., na.rm=TRUE))) %>% 
  mutate_at(which(sapply(., is.numeric)), funs(sum(0.5)))

但是这会将我的所有数字列替换为0.5,而不是将0.5增加到原始密度。

  ID    `Image Tag` `CD3 Global Den… `CD8 Global Den… `CD20 Global De… `CD3 Tumour Den… `CD8 Tumour Den…
  <chr>       <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
1 IM_10         0.5              0.5              0.5              0.5              0.5              0.5
2 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5
3 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5
4 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5
5 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5
6 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5

任何想法如何做到这一点?

2 个答案:

答案 0 :(得分:0)

我假设您要汇总每个ID,然后将0.5添加到每个值(不是NA)。然后这就是我要怎么做:

# Sample data
df <- structure(list(ID = c("IM_10", "IM_11", "IM_12", "IM_13", "IM_14", 
                            "IM_15"), Image_Tag = c(NA, NA, NA, NA, NA, NA), CD3_Global_Den = c(608, 
                                                                                                27.5, 19.6, 109, 219, 4), CD8_Global_Den = c(755, 69.3, 17, 89, 
                                                                                                                                             171, 0), CD20_Global_De = c(51, 0.55, 1.03, 47.7, 0.501, 0), 
                     CD3_Tumour_Den = c(868, 30.4, 53.2, 725, 531, 5.94), CD8_Tumour_Den = c(1066, 
                                                                                             75.2, 42, 594, 416, 0)), row.names = c(NA, -6L), class = c("tbl_df", 
                                                                                                                                                                                                           "tbl", "data.frame"), .Names = c("ID", "Image_Tag", "CD3_Global_Den", 
                                                                                                                                                                                                                                            "CD8_Global_Den", "CD20_Global_De", "CD3_Tumour_Den", "CD8_Tumour_Den"
                                                                                                                                                                                                           ))

# Suggested code
library(hablar)
library(dplyr)
options(pillar.sigfig = 6)

df %>% group_by(ID) %>% 
  summarise_all(~mean_(.)) %>% 
  mutate_at(vars(-ID), ~. + 0.5)

给出结果:

# A tibble: 6 x 7
  ID    Image_Tag CD3_Global_Den CD8_Global_Den CD20_Global_De CD3_Tumour_Den CD8_Tumour_Den
  <chr>     <dbl>          <dbl>          <dbl>          <dbl>          <dbl>          <dbl>
1 IM_10        NA          608.5          755.5       51.5             868.5          1066.5
2 IM_11        NA           28             69.8        1.05             30.9            75.7
3 IM_12        NA           20.1           17.5        1.53             53.7            42.5
4 IM_13        NA          109.5           89.5       48.2             725.5           594.5
5 IM_14        NA          219.5          171.5        1.00100         531.5           416.5
6 IM_15        NA            4.5            0.5        0.5               6.44            0.5

答案 1 :(得分:0)

如果你只想添加一个 df%>% map_if(is.numeric, ~.+1)