从R中的数字数据框中删除字符数据

时间:2016-09-22 12:18:22

标签: r dataframe subset

我有一个数据框基本上它的标题已经被回收了几次,所以看起来像这样:

Scanner sc = new Scanner(new File("temperatur.txt"));
int[] temp = new int [12];
int counter = 0;
while (sc.hasNextLine()) {
   temp[counter] = sc.nextLine();
   counter++;
}

int sum = 0;
for(int i = 0; i < temp.length; i++) {
    sum += temp[i];
}

double snitt = (sum / temp.length);
System.out.println("The average temperature is " + snitt);

大多数变量都有数值;然而,有些人具有特征 - 因此将整个df转换为数字并不会帮助我。我想知道如何对数据帧进行子集化以删除重新出现的标题?所以,最后我会有这个:

var1    var2    var3    var4
   1       1       1     'ch'
   1       1       1     'ch'
   1       1       1     'ch'
var1    var2    var3    var4
   1       1       1     'ch'
   1       1       1     'ch'
   1       1       1     'ch'
var1    var2    var3    var4

3 个答案:

答案 0 :(得分:3)

你可以试试这个:

df[,1:3] <- sapply(df[,1:3], function(x) as.integer(as.character(x)))
df <- df[complete.cases(df),]

答案 1 :(得分:2)

拥有额外的标题会将您的所有数据转换为因子(或使用stringsAsFactors=FALSE时的字符):

dd <- read.table(text="var1    var2    var3    var4
   1       1       1     'ch'
   1       1       1     'ch'
   1       1       1     'ch'
var1    var2    var3    var4
   1       1       1     'ch'
   1       1       1     'ch'
   1       1       1     'ch'
var1    var2    var3    var4")

将除最后一列之外的所有列转换为数字(忽略警告):

dd[,1:3] <- lapply(dd[,1:3],
                    function(x) as.numeric(as.character(x)))

丢弃前三列为NA的行:

dd <- dd[apply(dd[,1:3],1,function(x)!all(is.na(x))),]

答案 2 :(得分:0)

这个怎么样:

rpts <- unique(as.vector(sapply(1:ncol(d), function(i) which(names(d)[i]==d[,i]))))
d <- d[-1*rpts,]

第一行提取出现所有列的相应列名称的行(即names(d))。第二行考虑删除那些提取的行(即rpts)。

数据

d <- structure(list(var1 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 
2L), .Label = c("1", "var1"), class = "factor"), var2 = structure(c(1L, 
1L, 1L, 2L, 1L, 1L, 1L, 2L), .Label = c("1", "var2"), class = "factor"), 
    var3 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L), .Label = c("1", 
    "var3"), class = "factor"), var4 = structure(c(1L, 1L, 1L, 
    2L, 1L, 1L, 1L, 2L), .Label = c("ch", "var4"), class = "factor")), .Names = c("var1", 
"var2", "var3", "var4"), class = "data.frame", row.names = c(NA, 
-8L))