我有一个数据框基本上它的标题已经被回收了几次,所以看起来像这样:
Scanner sc = new Scanner(new File("temperatur.txt"));
int[] temp = new int [12];
int counter = 0;
while (sc.hasNextLine()) {
temp[counter] = sc.nextLine();
counter++;
}
int sum = 0;
for(int i = 0; i < temp.length; i++) {
sum += temp[i];
}
double snitt = (sum / temp.length);
System.out.println("The average temperature is " + snitt);
大多数变量都有数值;然而,有些人具有特征 - 因此将整个df转换为数字并不会帮助我。我想知道如何对数据帧进行子集化以删除重新出现的标题?所以,最后我会有这个:
var1 var2 var3 var4
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'
var1 var2 var3 var4
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'
var1 var2 var3 var4
答案 0 :(得分:3)
你可以试试这个:
df[,1:3] <- sapply(df[,1:3], function(x) as.integer(as.character(x)))
df <- df[complete.cases(df),]
答案 1 :(得分:2)
拥有额外的标题会将您的所有数据转换为因子(或使用stringsAsFactors=FALSE
时的字符):
dd <- read.table(text="var1 var2 var3 var4
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'
var1 var2 var3 var4
1 1 1 'ch'
1 1 1 'ch'
1 1 1 'ch'
var1 var2 var3 var4")
将除最后一列之外的所有列转换为数字(忽略警告):
dd[,1:3] <- lapply(dd[,1:3],
function(x) as.numeric(as.character(x)))
丢弃前三列为NA
的行:
dd <- dd[apply(dd[,1:3],1,function(x)!all(is.na(x))),]
答案 2 :(得分:0)
这个怎么样:
rpts <- unique(as.vector(sapply(1:ncol(d), function(i) which(names(d)[i]==d[,i]))))
d <- d[-1*rpts,]
第一行提取出现所有列的相应列名称的行(即names(d)
)。第二行考虑删除那些提取的行(即rpts
)。
数据强>
d <- structure(list(var1 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L,
2L), .Label = c("1", "var1"), class = "factor"), var2 = structure(c(1L,
1L, 1L, 2L, 1L, 1L, 1L, 2L), .Label = c("1", "var2"), class = "factor"),
var3 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L), .Label = c("1",
"var3"), class = "factor"), var4 = structure(c(1L, 1L, 1L,
2L, 1L, 1L, 1L, 2L), .Label = c("ch", "var4"), class = "factor")), .Names = c("var1",
"var2", "var3", "var4"), class = "data.frame", row.names = c(NA,
-8L))