我有一个大型样本数据集,其中包含样本是否可行的描述符 - 它看起来像这样(种类),其中'desc'是描述列,'blank'表示样本不可行:
desc x y z
1 blank 4.529976 5.297952 5.581013
2 blank 5.906855 4.557389 4.901660
3 sample 4.322014 4.798248 4.995959
4 sample 3.997565 5.975604 7.160871
5 blank 4.898922 7.666193 5.551385
6 blank 5.667884 5.195825 5.232072
7 blank 5.524773 6.726074 4.767475
8 sample 4.382937 5.926217 5.203737
9 sample 4.976908 3.079191 4.614121
10 blank 4.572954 4.772373 6.077195
我想使用if else语句将具有不可用数据的行设置为NA。最终数据集应如下所示:
desc x y z
1 blank NA NA NA
2 blank NA NA NA
3 sample 4.322014 4.798248 4.995959
4 sample 3.997565 5.975604 7.160871
5 blank NA NA NA
6 blank NA NA NA
7 blank NA NA NA
8 sample 4.382937 5.926217 5.203737
9 sample 4.976908 3.079191 4.614121
10 blank NA NA NA
我尝试过for循环,但是我无法让for循环在一个循环中更改所有列。我的真实数据集有40列,所以我宁愿不必在单独的循环中处理它!以下是一次更改一列的代码:
for(i in 1:length(desc)){
if(dat$desc[i] =="blank"){
dat$x[i] <- NA
}
else {
dat$x[i] <- dat$x[i]
}
}
我用这个脚本制作了样本数据:
desc <- c("blank", "blank", "sample", "sample", "blank", "blank", "blank", "sample", "sample", "blank")
x <- rnorm(10, mean=5, sd=1)
y <- rnorm(10, mean=5, sd=1)
z <- rnorm(10, mean=5, sd=1)
dat <- data.frame(desc,x,y,z)
很抱歉,如果这是一个基本问题,我整个上午都在看论坛,但却找不到解决方案。
非常感谢任何帮助!
答案 0 :(得分:3)
对于您的示例数据集,这将起作用;
选项1,命名要更改的列:
dat[which(dat$desc == "blank"), c("x", "y", "z")] <- NA
在包含40列的实际数据中,如果您只想将最后39列设置为NA,则以下内容可能比命名要更改的每个列更简单;
选项2,使用范围选择列:
dat[which(dat$desc == "blank"), 2:40] <- NA
选项3,排除第1列:
dat[which(dat$desc == "blank"), -1] <- NA
选项4,排除命名列:
dat[which(dat$desc == "blank"), !names(dat) %in% "desc"] <- NA
正如您所看到的,有很多方法可以执行此类操作(这远不是完整列表),了解每个选项的工作方式将有助于您更好地理解语言。
答案 1 :(得分:2)
使用您的第一个循环初始方法,我发现了这一点:
for(i in 1:nrow(dat)){
if(dat[i, 1] =="blank"){
dat[i, 2:4] <- NA
}
else {
dat[i,length(dat)] <- dat[i, length(dat)]
}
}
我用您的数据对其进行了测试并成功运行。希望这对处理条件行和列中的循环的每个人都有用。
答案 2 :(得分:1)
您可以使用dplyr和自定义函数在某些条件下改变值。
`
{
"name": "angular2-quickstart",
"version": "1.0.0",
"scripts": {
"start": "tsc && concurrently \"npm run tsc:w\" \"npm run lite\" ",
"lite": "lite-server",
"postinstall": "typings install",
"tsc": "tsc",
"tsc:w": "tsc -w",
"typings": "typings"
},
"license": "ISC",
"dependencies": {
"@angular/common": "2.0.0-rc.1",
"@angular/compiler": "2.0.0-rc.1",
"@angular/core": "2.0.0-rc.1",
"@angular/http": "2.0.0-rc.1",
"@angular/platform-browser": "2.0.0-rc.1",
"@angular/platform-browser-dynamic": "2.0.0-rc.1",
"@angular/router": "2.0.0-rc.1",
"@angular/router-deprecated": "2.0.0-rc.1",
"@angular/upgrade": "2.0.0-rc.1",
"systemjs": "0.19.27",
"es6-shim": "^0.35.0",
"reflect-metadata": "^0.1.3",
"rxjs": "5.0.0-beta.6",
"zone.js": "^0.6.12",
"angular2-in-memory-web-api": "0.0.7",
"bootstrap": "^3.3.6"
},
"devDependencies": {
"concurrently": "^2.0.0",
"lite-server": "^2.2.0",
"typescript": "^1.8.10",
"typings":"^0.8.1"
}
}
`
答案 3 :(得分:1)
以下是set
使用data.table
的选项。它应该更快,因为避免了[.data.table
的开销。我们转换了&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)
),循环显示&#39; df1&#39;的列名(不包括&#39; desc&#39;列&#39;),将元素分配给&#34; NA&#34;其中逻辑条件是“我”。满足了。
library(data.table)
setDT(df1)
for(j in names(df1)[-1]){
set(df1, i= which(df1[["desc"]]=="blank"), j= j, value= NA)
}
df1
# desc x y z
# 1: blank NA NA NA
# 2: blank NA NA NA
# 3: sample 4.322014 4.798248 4.995959
# 4: sample 3.997565 5.975604 7.160871
# 5: blank NA NA NA
# 6: blank NA NA NA
# 7: blank NA NA NA
# 8: sample 4.382937 5.926217 5.203737
# 9: sample 4.976908 3.079191 4.614121
#10: blank NA NA NA
或另一种选择(基于@ dww&#39;评论)
setDT(df1, key = "desc")["blank", names(df1)[-1] := NA][]
答案 4 :(得分:0)
这应该有效。老实说,如果数据无法使用,为什么不完全删除这些行呢?
library(dplyr)
blanks =
dat %>%
filter(desc == "blank") %>%
select(desc)
dat %>%
filter(desc == "sample") %>%
bind_rows(blanks)
答案 5 :(得分:0)
这是另一个带有小型自定义函数和mutate_each()
的dplyr解决方案。
library(dplyr)
f <- function(x) if_else(dat$desc == "blank", NA_real_, x)
dat %>%
mutate_each(funs(f), -desc)
#> desc x y z
#> 1 blank NA NA NA
#> 2 blank NA NA NA
#> 3 sample 3.624941 6.430955 5.486632
#> 4 sample 3.236359 4.935453 4.319202
#> 5 blank NA NA NA
#> 6 blank NA NA NA
#> 7 blank NA NA NA
#> 8 sample 5.058725 6.751650 4.750529
#> 9 sample 5.837206 4.323562 4.914780
#> 10 blank NA NA NA