我不熟悉R。我使用的是数据集,缺少的值已替换为“?”。在我得到数据之前。我正在寻找一种删除包含此内容的行的方法。它并不仅限于所有行中的一行。
我已经尝试过Delete rows containing specific strings in R,但对我来说不起作用。到目前为止,我已经包含了我的代码。
library(randomForest)
heart <- read.csv(url('http://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data'))
names <- names(heart)
nrow(heart)
ncol(heart)
names(heart)
colnames(heart)[colnames(heart)=="X11"] <- "survival"
colnames(heart)[colnames(heart)=="X0"] <- "alive"
colnames(heart)[colnames(heart)=="X71"] <- "attackAge"
colnames(heart)[colnames(heart)=="X0.1"] <- "pericardialEffusion"
colnames(heart)[colnames(heart)=="X0.260"] <- "fractionalShortening"
colnames(heart)[colnames(heart)=="X9"] <- "epss"
colnames(heart)[colnames(heart)=="X4.600"] <- "lvdd"
colnames(heart)[colnames(heart)=="X14"] <- "wallMotionScore"
colnames(heart)[colnames(heart)=="X1"] <- "wallMotionIndex"
colnames(heart)[colnames(heart)=="X1.1"] <- "mult"
colnames(heart)[colnames(heart)=="name"] <- "patientName"
colnames(heart)[colnames(heart)=="X1.2"] <- "group"
colnames(heart)[colnames(heart)=="X0.2"] <- "aliveAfterYear"
names(heart)
答案 0 :(得分:2)
library(randomForest)
heart <- read.csv(url('http://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data'),na.strings = "?")
names <- names(heart)
nrow(heart)
ncol(heart)
names(heart)
colnames(heart)[colnames(heart)=="X11"] <- "survival"
colnames(heart)[colnames(heart)=="X0"] <- "alive"
colnames(heart)[colnames(heart)=="X71"] <- "attackAge"
colnames(heart)[colnames(heart)=="X0.1"] <- "pericardialEffusion"
colnames(heart)[colnames(heart)=="X0.260"] <- "fractionalShortening"
colnames(heart)[colnames(heart)=="X9"] <- "epss"
colnames(heart)[colnames(heart)=="X4.600"] <- "lvdd"
colnames(heart)[colnames(heart)=="X14"] <- "wallMotionScore"
colnames(heart)[colnames(heart)=="X1"] <- "wallMotionIndex"
colnames(heart)[colnames(heart)=="X1.1"] <- "mult"
colnames(heart)[colnames(heart)=="name"] <- "patientName"
colnames(heart)[colnames(heart)=="X1.2"] <- "group"
colnames(heart)[colnames(heart)=="X0.2"] <- "aliveAfterYear"
names(heart)
heart1 <- na.omit(heart)
在导入文件时,您可以将na.string指定为?然后使用na.omit可以删除所有?或NA字符串
答案 1 :(得分:1)
我认为这可以满足您的要求。
# Do not forget to set stringsAsFactors as false to the read.csv
# as to make string comparison efficient
heart <- read.csv(url('http://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data'),stringsAsFactors = F)
# Simpler way to assign column names to the dataframe
colnames(heart) <- c("survival", "alive", "attackAge", "pericardialEffusion",
"fractionalShortening", "epss", "lvdd", "wallMotionScore",
"wallMotionIndex", "mult", "patientName",
"group", "aliveAfterYear")
# You can traverse a dataframe as a matrix using the row and column index
# as coordinates
for(r in 1:nrow(heart)){
for(c in 1:ncol(heart)){
# For this particular cell you do a comparison
# substituting the ? with NA which is the default missing value
# in R
heart[r,c] <- ifelse(heart[r,c]=="?",NA,heart[r,c])
}
}
# omit the NA rows
heart <- na.omit(heart)
答案 2 :(得分:0)
某些库支持读取csv文件并指定要读取的字符串作为缺失值。我最常使用readr
库。然后,您可以只使用na.omit
和类似的功能。
library(readr)
library(dplyr)
heart <- read_csv(
'http://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data',
na=c("", "?")
)
colnames(heart) <- recode(
colnames(heart),
"X11" = "survival",
"X0" = "alive",
"X71" = "attackAge",
"X0.1" = "pericardialEffusion",
"X0.260" = "fractionalShortening",
"X9" = "epss",
"X4.600" = "lvdd",
"X14" = "wallMotionScore",
"X1" = "wallMotionIndex",
"X1.1" = "mult",
"name" = "patientName",
"X1.2" = "group",
"X0.2" = "aliveAfterYear"
)
heart
heart <- na.omit(heart)
(此外,您可以使用recode
包中的dplyr
函数保留一些键入内容,但是重命名列的解决方案效果很好。)