如何仅舍入数据框中的数值?

时间:2019-06-19 23:26:09

标签: r dataframe rounding significant-digits

我试图四舍五入数据框中的所有数值。

问题是我的数据框还包括字符串,而不仅仅是在任何特定的列或行中。我想避免编写循环来遍历每个单独的行-列单元格对,并在舍入前检查该值是否为数字,这样就避免了。

是否有一个功能(或功能组合)可以帮助我实现这一目标?

到目前为止,我已经尝试过round_df()以及各种lapply()apply()与lambda的组合。但是,我只能根据列中的第一个值四舍五入(即,如果第一个值是数字,则将整个列都视为数字并且仅将其四舍五入)。

然后我遇到了一个问题,那就是第一个值是一个字符串,因此整个列都未取整,反之亦然,这是我的代码错误,因为它试图取整字符串。

我的功能是:

library(readxl)
library(knitr)
library(gplots)
library(doBy)
library(dplyr)
library(plyr)
library(printr)   
library(xtable)   
library(gmodels)
library(survival)
library(pander)
library(psych)
library(questionr)
library(DT)
library(data.table)
library(expss)
library(xtable)
options(xtable.floating = FALSE)
options(xtable.timestamp = "")
library(kableExtra)
library(magrittr)
library(Hmisc)
library(forestmangr)
library(summarytools)
library(gmodels)
library(stats)

summaryTable <- function(y, bygroup, digit, 
                         title="", caption_heading="", caption="", freq.tab, y.label="",
                         y.names="", boxplot) {
  if (freq.tab) {
    m = multi.fun(y)
  }
  else if (!missing(bygroup)) {
    m = data.frame(y.label = "")
    m = merge(m, data.frame(describeBy(y, bygroup, mat = T)))
    m = select(m, y.label, n, mean, sd, min, median, max)
  }
  else {
    m = data.frame(y.label = "")
    m = merge(m, data.frame(sumconti(y)))
  }
  if (!freq.tab) {
    m$y.label = y.names
  }
  m = round_df(m, digit, "signif")
  if (freq.tab) {
    colnames(m) = c(y.label, "Frequency", "%")
  }
  else if (missing(freq.tab) | !freq.tab) {
    colnames(m) = c(y.label, "n", "Mean", "Std", "Min", "Median", "Max")
  }
  if (!missing(boxplot)) {
    if (boxplot) {
      attach(m)
      layout(matrix(c(1, 1, 2, 1)), 2, 1)
       
      kable(m, align = "c", "latex", booktabs = T, caption=figTitle(x, title, y.label)) %>% 
        kable_styling(position = 'center', 
                      latex_options = c("striped", "repeat_header", "hold_position")) %>% 
        footnote(general = caption, general_title = caption_heading, footnote_as_chunk = T, 
                 title_format = c("italic", "underline"), threeparttable = T)
      
      boxplot(y ~ bygroup, main = figTitle(y, title, y.label), names = y.names, ylab = title, 
              xlab = y.label, col = c("red", "blue", "orange", "pink", 
                                      "green", "purple", "grey", "yellow"), border = "black", 
              horizontal = F, varwidth = T)
    }
  }
  kable(m, 
        align = "c", 
        "latex", 
        booktabs = T, 
        caption = figTitle(x, title, y.label)) %>% 
    kable_styling(position = 'center', 
                  latex_options = c("striped", "repeat_header", "hold_position")) %>% 
    footnote(general = caption, 
             general_title = caption_heading, 
             footnote_as_chunk = T, 
             title_format = c("italic", "underline"), 
             threeparttable = T)
}


figTitle = function(x, title, y.label) {
  if (y.label != "") {
    paste("Summary of", title, "by", y.label)
  }
  else if (title != "") {
    paste("Summary of", title)
  }
  else {
    paste("")
  }
}


output.pv <- function(pp, digit) {
  ifelse (pp < 0.001, paste("< 0.001"), signif(pp, digit))
}

输入数据后的输出如下所示:

enter image description here

2 个答案:

答案 0 :(得分:3)

该问题未包含数据,因此我们并不真正知道问题的确切含义(请始终提供一个完整的,最小的可重现示例),但是我们基于两种可能的问题将答案分为两个部分并提供了每个的测试数据。不使用任何软件包。

仅舍入数字

如果问题在于您混合使用数字和字符,而只想四舍五入,则可以使用以下几种方法。

1)计算哪些列为数字,给出逻辑向量ok,然后将其四舍五入。我们以内置的嘌呤霉素数据集为例。不使用任何软件包。

ok <- sapply(Puromycin, is.numeric)
replace(Puromycin, ok, round(Puromycin[ok], 1))

给予:

   conc rate     state
1   0.0   76   treated
2   0.0   47   treated
3   0.1   97   treated
4   0.1  107   treated
5   0.1  123   treated
6   0.1  139   treated
...etc...

1a)如果您不介意覆盖输入内容,也可以这样写最后一行。

Puromycin[ok] <- round(Puromycin[ok], 1)

2)另一种方法是在lapply

中执行条件
Round <- function(x, k) if (is.numeric(x)) round(x, k) else x
replace(Puromycin, TRUE, lapply(Puromycin, Round, 1))

2a)或覆盖:

Puromycin[] <- lapply(Puromycin, Round, 1)

围绕一切

如果问题是所有列均假定为数字,而实际上却是字符(尽管它们代表数字),则以指示的数据框为例,应用type.convert

# create test data having numeric, character and factor columns but
# all intended to represent numbers
DF <- structure(list(Time = c("0.1", "0.12", "0.3", "0.14", "0.5", 
"0.7"), demand = c(0.83, 1.03, 1.9, 1.6, 1.56, 1.98), Time2 = structure(c(1L, 
2L, 4L, 3L, 5L, 6L), .Label = c("0.1", "0.12", "0.14", "0.3", 
"0.5", "0.7"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

round(replace(DF, TRUE, lapply(DF, type.convert)), 1)

答案 1 :(得分:1)

为以上选项添加最后一种可能性:

假设您的字符列同时包含(不仅)数字,而且还包含字符串格式。那么以下方法可能会有所帮助。

library(dplyr)
library(purrr)

# I use the data from above's answer with an additional mixed column
DF <- structure(
  list(
    Time = c("0.1", "0.12", "0.3", "0.14", "0.5",
             "0.7"),
    demand = c(0.83, 1.03, 1.9, 1.6, 1.56, 1.98),
    Mix = c("3.38", "4.403", "a", "5.34", "c", "9.32"),
    Time2 = structure(
      c(1L,
        2L, 4L, 3L, 5L, 6L),
      .Label = c("0.1", "0.12", "0.14", "0.3",
                 "0.5", "0.7"),
      class = "factor"
    )
  ),
  class = "data.frame",
  row.names = c(NA,-6L)
)

TBL <- as_tibble(DF)

# This are the functions we use
round_string_number <- function(x) {
  ifelse(!is.na(as.double(x)),
         as.character(round(as.double(x), digit = 1)),
         x)
}

round_string_factor <- compose(round_string_number, as.character)

# Here the recode is happening
TBL %>%
  mutate_if(is.numeric, ~ round(., digit = 1)) %>% 
  mutate_if(is.factor, round_string_factor) %>% 
  mutate_if(~!is.numeric(.), round_string_number)

这将转换此数据

  Time  demand Mix   Time2
  <chr>  <dbl> <chr> <fct>
1 0.1     0.83 3.38  0.1  
2 0.12    1.03 4.403 0.12 
3 0.3     1.9  a     0.3  
4 0.14    1.6  5.34  0.14 
5 0.5     1.56 c     0.5  
6 0.7     1.98 9.32  0.7  

对此:

  Time  demand Mix   Time2
  <chr>  <dbl> <chr> <chr>
1 0.1      0.8 3.4   0.1  
2 0.1      1   4.4   0.1  
3 0.3      1.9 a     0.3  
4 0.1      1.6 5.3   0.1  
5 0.5      1.6 c     0.5  
6 0.7      2   9.3   0.7