Question

我已经从excel文件中读取了一组数据到变量＆＃34;比例＆＃34;。我测试了read.xls（package-gdata），read.xlsx（package-xlsx）和XLConnect的结果（但没有区别）。

我需要分析比例变量的第27列。

 > class(proportions$X.27)
 [1] "character"

您可以看到该列的类最初是一个字符。

我的问题是，当我以特定方式将数据子集化为名为＆＃34; proportionions1＆＃34;的变量时，列类会受到干扰......

 proportions1<-data.frame(proportions$X.5, proportions$Lab.data, proportions$X.26, proportions$X.27, proportions$X.28, proportions$X.29, proportions$X.30)
 > class(proportions1$X.27)
 [1] "NULL"

当我将colnames重置为有意义的内容时，它会更改列＆＃39; class to＆＃34; factor＆＃34;

 > colnames(proportions) <- c("barcode1", "WBC", "Neutrophils", "Lymphocytes", "Eosinophils", "Monocytes", "Basophils")
 > class(proportions1$Neutrophils)
 [1] "factor"

稍后我必须将这些列值转换为数值，以便我可以绘制它们。我是这样做的：

     as.numeric(as.character(proportions1$Neutrophils)) etc.

然而，这会导致我的数据丢失多达90％（在不同的分析列中有所不同，即在我的某些列中，我会失去10％，而在其他列中则有90％的值）。

当我将比例数据与最终数据对比时：

 head(proportions$X.27)
 [1]  "10.8"            "5.3"             "3.9"            "2.8" ......

因此，根据以下错误（注意值已重新排列）：

 > head(as.numeric(as.character(proportions1$Neutrophils)))
 [1]   NA   NA  2.3 14.9   NA   NA
 Warning message:
 In head(as.numeric(as.character(proportions1$Neutrophils))) :
   NAs introduced by coercion

我可以采取哪些措施来减轻NA值吗？

数据帧列数据类别无意中发生了变化 - 导致下游NA值

0 个答案: