Question

在数据框中，我希望能够使用带字符串/字符的列来分隔具有数字类型的列。

这是我的数据：

test=data.frame(col1=sample(1:20,10),col2=sample(31:50,10),
col3=sample(101:150,10),col4=sample(c('a','b','c'),10,replace=T))

看起来像

   col1 col2 col3 col4
1     2   41  132    c
2    11   47  141    b
3    13   39  135    a
4    12   31  117    b
5    19   42  106    a
6     8   50  118    a
7    14   33  149    a
8     6   48  148    b
9    16   37  150    b
10    9   34  140    a

现在，如果我输入包含字符的行/列，这是奇怪的事情，R表示它是一个整数

> typeof(test[1,4])
[1] "integer"

如果我这样做

> apply(test,2,typeof)
       col1        col2        col3        col4 
"character" "character" "character" "character"

R说他们都是人物。此外，

> lapply(test,typeof)
[1] "integer" "integer" "integer" "integer"

同样，发生了什么，是否有一种很好的方法来区分具有字符的列和具有整数的列？

Answer 1

apply适用于数组和矩阵，而非数据框。

要处理数据框，它首先将其转换为矩阵。

您的数据框有一个因子列，因此数组会将所有内容转换为字符。没有麻烦告诉你。

如您所见，sapply是要走的路，class可能就是您想要找到的东西。虽然还有mode，typoeof和storage.mode取决于您想知道的内容：

> test$col5=letters[1:10]  # really character, not a factor
> test$col3=test$col3*pi # lets get some decimals in there


> sapply(test, mode)
       col1        col2        col3        col4        col5 
  "numeric"   "numeric"   "numeric"   "numeric" "character" 
> sapply(test, class)
       col1        col2        col3        col4        col5 
  "integer"   "integer"   "numeric"    "factor" "character" 
> sapply(test, typeof)
       col1        col2        col3        col4        col5 
  "integer"   "integer"    "double"   "integer" "character" 
> sapply(test, storage.mode)
       col1        col2        col3        col4        col5 
  "integer"   "integer"    "double"   "integer" "character"

Answer 2

好的，我想出了自己的问题，抱歉：

sapply(test,class)

Answer 3

col4是一个因素：

str(test)
#'data.frame':  10 obs. of  4 variables:
#$ col1: int  11 14 8 19 10 12 7 18 3 16
#$ col2: int  46 39 35 38 42 37 34 32 41 31
#$ col3: int  113 147 138 118 132 139 131 119 108 111
#$ col4: Factor w/ 3 levels "a","b","c": 1 3 2 3 2 3 3 3 1 3

内部因素是一个整数（由typeof报告），其中包含类factor和levels属性。 apply将data.frame强制转换为矩阵。由于矩阵只能容纳一种数据类型，因此在应用typeof之前，所有内容都会强制转换为字符。

使用class区分数据类型，并lapply（或sapply）循环列。

Answer 4

data.frame（col4 = sample（c（'a'，'b'，'c'），10，replace = T））col4是一个因子。

应用（试验2的typeof）：如果昏暗（测试）== 2L，它将首先使用as.matrix（测试）。

这是强制吗？为什么R告诉我这些是相同的数据类型？

4 个答案: