Question

到目前为止，我导入特定文件的代码类似于

df <- read_excel("File path", 
       col_types = c("numeric", "text", "numeric", 
         "numeric", "numeric", "numeric", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "numeric", "numeric", "numeric", "numeric", 
         "numeric", "numeric", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text"), 
       skip = 8)

我如何专门压缩＆＃34; col类型＆＃34;部分但仍保持相同的效果。我已尝试sapply(df, as.numeric)，但这会将所有列更改为数字，我特别需要第二个为文本。

注意：我理解除了第二个之外的其他列有＆＃34; text＆＃34;，这个例子是我尝试的中间点。

Answer 1

请注意，read_excel会猜测您的类型，但如果这不适用于您的电子表格：

1）代表像这样使用rep：

col_types <- rep(c("numeric", "text", "numeric", "text", "numeric", "text"),
                 c(1L, 1L, 4L, 67L, 6L, 13L))

# test - col_types_orig defined in Note at end
identical(col_types, col_types_orig) 
## [1] TRUE

2）rle 我们也可以使用rle进行压缩，然后使用inverse.rle解压缩：

r <- rle(col_types_orig)
col_types <- inverse.rle(r)

identical(inverse.rle(r), col_types_orig)
## [1] TRUE

您可以使用r将dput(r)作为R代码。（事实上，我们通过检查此rep输出得到了（1）中dput的参数。）

3）注意col_types_orig中有92个元素，除了一些数字之外都是文字，我们可以这样做：

length(col_types_orig)
## [1] 92

table(col_types_orig)
## col_types_orig
## numeric    text 
##      11      81 

which(col_types_orig == "numeric")
## [1]  1  3  4  5  6 74 75 76 77 78 79

col_types <- replace(rep("text", 92), c(1, 3:6, 74:79), "numeric")

identical(col_types, col_types_orig)
 ## [1] TRUE

注意：

col_types_orig <- c("numeric", "text", "numeric", "numeric", "numeric", "numeric", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text"

Answer 2

如果您愿意采取两个步骤进行导入，则可以先将所有内容作为文本阅读，然后使用dplyr::mutate_at将相关列转换为数字：

library(tidyverse)
library(readxl)

df <- read_excel("File path", col_types = "text", skip = 8) %>%
  mutate_at(c(1, 3:6, 74:79), as.numeric)

如何在R中压缩导入的列类型

2 个答案: