Question

当我将数据读入tidyverse tibble时，所有列均显示为col_double()。 col_double在这里是什么意思？

当我使用基本R str()函数检查小标题时，所有数据都已作为正确的类型读入。

当我检查小标题的类型时，它会返回列表。

这一切对我来说主要归结为三个主要问题：

col_double是什么意思？
tidyverse是否可以替代str()函数来检查列数据类型？
检查对象是否为小物件的正确方法是什么？

Answer 1

double是编程语言中（通常但并非必须）非整数的相当标准的术语。 R并没有过多地使用该术语（而是使用numeric），但是C使用了双精度词，并且R建立在C之上。如果您想了解更多信息，请here's the Wikipedia page: Double-precision floating-point format。

readr使用col_double()来（严格）解析数字列。有关更多详细信息，请参见帮助页面?col_double，有关更多详细信息，请参见包装插图Introduction to readr。默认情况下，readr将猜测您的每一列是什么，然后对每种类型使用适当的解析函数，例如col_double用于非整数。

检查某物是否为小物件的最佳方法是is_tibble。您还可以使用class()或str()（显示班级信息），并查看tbl_df是否在班级中。

tidyverse函数始终使用术语double（有时缩写为dbl）代替numeric。您可以在小标题打印或glimpse方法中看到这一点：

> as_tibble(head(iris))
# A tibble: 6 x 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl> <fct>  
1          5.1         3.5          1.4         0.2 setosa 
2          4.9         3            1.4         0.2 setosa 
3          4.7         3.2          1.3         0.2 setosa 
4          4.6         3.1          1.5         0.2 setosa 
5          5           3.6          1.4         0.2 setosa 
6          5.4         3.9          1.7         0.4 setosa 

> glimpse(head(iris))
Observations: 6
Variables: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4
$ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa

## str (from base R) uses `num` instead of `<dbl>`
> str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Answer 2

我建议从readr包中阅读documentation for reading delimited files。

使用read_csv()时，R会猜测每列的适当变量类型。消息“ 已用列说明解析”提供了变量及其检测类型的详细信息。 “ double”本质上是一个非整数数字变量。
小标题始终在输出时显示变量类型。（例如，上面的视图链接）一种替代方法是使用glimpse()，但是您仍然可以使用str()。
您可以使用is_tibble()

为什么在将数据读入小标题时，tidyverse为什么将所有列都显示为col_double（）？

2 个答案: