我有一个包含9个变量的df。
> df
Date x01 y01 a01 x02 y02 a02 x03 y03 a03
1 2017-01-01 0.6 0.5 1 0.7 0.5 0 0.8 0.6 1
2 2017-01-02 0.9 0.6 1 1.0 0.7 1 1.0 0.7 1
3 2017-01-03 0.1 0.2 1 0.2 0.2 0 0.3 0.2 1
4 2017-01-04 0.2 0.6 1 0.2 0.6 1 0.3 0.7 1
5 2017-01-05 0.4 0.3 1 0.5 0.3 1 0.6 0.4 1
6 2017-01-06 0.6 0.3 1 0.6 0.3 1 0.7 0.4 1
7 2017-01-07 0.6 0.1 1 0.6 0.2 1 0.6 0.2 0
8 2017-01-08 0.9 0.9 1 0.9 1.0 1 1.0 1.0 0
9 2017-01-09 0.1 0.7 1 0.2 0.7 0 0.2 0.8 1
10 2017-01-10 0.2 0.6 1 0.3 0.6 1 0.3 0.7 1
当相同编号的'a'变量不是1时,我想用NA替换'x','y'的值。所以结果将如下所示
Date x01 y01 a01 x02 y02 a02 x03 y03 a03
1 2017-01-01 0.6 0.5 1 NA NA NA 0.8 0.6 1
2 2017-01-02 0.9 0.6 1 1.0 0.7 1 1.0 0.7 1
3 2017-01-03 0.1 0.2 1 NA NA NA 0.3 0.2 1
4 2017-01-04 0.2 0.6 1 0.2 0.6 1 0.3 0.7 1
5 2017-01-05 0.4 0.3 1 0.5 0.3 1 0.6 0.4 1
6 2017-01-06 0.6 0.3 1 0.6 0.3 1 0.7 0.4 1
7 2017-01-07 0.6 0.1 1 0.6 0.2 1 NA NA NA
8 2017-01-08 0.9 0.9 1 0.9 1.0 1 NA NA NA
9 2017-01-09 0.1 0.7 1 NA NA NA 0.2 0.8 1
10 2017-01-10 0.2 0.6 1 0.3 0.6 1 0.3 0.7 1
我已成功使用以下代码执行此操作。
mynames=unique(str_sub(names(df),2,3))[-1]
index<-lapply(mynames,function(x) str_detect(names(df),paste0(c("Date",x),collapse="|")))
dataList<-lapply(index, function(x) setNames(df[,x],nm=c("Date","V1","V2","A")))
subList<-lapply(dataList,function(x) filter(x,A>0.999))
df0=join_all(subList,by="Date")
我想知道是否有更优雅的方式
构建df的代码是
n=10
x01=round(runif(n),1)
x02=round((x01+runif(n)/10),1)
x03=round((x02+runif(n)/10),1)
y01=round(runif(n),1)
y02=round((y01+runif(n)/10),1)
y03=round((y02+runif(n)/10),1)
a01=rbinom(n,1,0.8)
a02=rbinom(n,1,0.8)
a03=rbinom(n,1,0.8)
Date=seq(ymd("2017-01-01"),ymd("2017-01-10"),by="day")
df=data.frame(Date,x01,x02,x03,y01,y02,y03,a01,a02,a03)
非常感谢
答案 0 :(得分:3)
将输出dfout
设置为df
输入,然后确定x,y和列的列数(xcols
,ycols
,{ {1}})。然后,对于那些设置中的每一个,相应的acols
值不是1到NA的那些元素。
a
,并提供:
dfout <- df
xcols <- grep("^x", names(df))
ycols <- grep("^y", names(df))
acols <- grep("^a", names(df))
dfout[xcols][df[acols] != 1] <- NA
dfout[ycols][df[acols] != 1] <- NA
dfout[acols][df[acols] != 1] <- NA
dfout
可重现形式的输入 Date x01 y01 a01 x02 y02 a02 x03 y03 a03
1 2017-01-01 0.6 0.5 1 NA NA NA 0.8 0.6 1
2 2017-01-02 0.9 0.6 1 1.0 0.7 1 1.0 0.7 1
3 2017-01-03 0.1 0.2 1 NA NA NA 0.3 0.2 1
4 2017-01-04 0.2 0.6 1 0.2 0.6 1 0.3 0.7 1
5 2017-01-05 0.4 0.3 1 0.5 0.3 1 0.6 0.4 1
6 2017-01-06 0.6 0.3 1 0.6 0.3 1 0.7 0.4 1
7 2017-01-07 0.6 0.1 1 0.6 0.2 1 NA NA NA
8 2017-01-08 0.9 0.9 1 0.9 1.0 1 NA NA NA
9 2017-01-09 0.1 0.7 1 NA NA NA 0.2 0.8 1
10 2017-01-10 0.2 0.6 1 0.3 0.6 1 0.3 0.7 1
是:
df
答案 1 :(得分:2)
&#34;旧学校&#34;溶液
使用grep
根据字母x,y获取列号
一个。
df.names <- names(df)
a.cols <- grep('^a', df.names)
x.cols <- grep('^x', df.names)
y.cols <- grep('^y', df.names)
对于每个&#39; a&#39;列,索引&#39; x&#39;并且&#39; y&#39; a列值不等于1的列,并将它们设置为NA
。
# for each a column, modify the corresponding x and y
for (i in 1:length(a.cols)) {
# get indexes of non-1 entries in 'a' cols
a.index <- df[,a.cols[i]]!=1
# change the corresponding entries in 'x' and 'y' cols
df[,x.cols[i]][a.index] = NA
df[,y.cols[i]][a.index] = NA
}
答案 2 :(得分:1)
使用dplyr和tidyr的解决方案。它需要多个gather
和spread
才能处理数据。
library(dplyr)
library(tidyr)
df2 <- df %>%
gather(Cols, Values, -Date) %>%
extract(Cols, into = c("Letter", "Number"), regex = "([A-Za-z])([0-9]*)") %>%
spread(Letter, Values) %>%
mutate(a = ifelse(a != 1, NA, a)) %>%
mutate_at(vars(x, y), funs(ifelse(is.na(a), NA, .))) %>%
gather(Letter, Values, -Date, -Number) %>%
unite(Cols, Letter, Number, sep = "") %>%
spread(Cols, Values) %>%
select(names(df))
df2
# Date x01 y01 a01 x02 y02 a02 x03 y03 a03
# 1 2017-01-01 0.6 0.5 1 NA NA NA 0.8 0.6 1
# 2 2017-01-02 0.9 0.6 1 1.0 0.7 1 1.0 0.7 1
# 3 2017-01-03 0.1 0.2 1 NA NA NA 0.3 0.2 1
# 4 2017-01-04 0.2 0.6 1 0.2 0.6 1 0.3 0.7 1
# 5 2017-01-05 0.4 0.3 1 0.5 0.3 1 0.6 0.4 1
# 6 2017-01-06 0.6 0.3 1 0.6 0.3 1 0.7 0.4 1
# 7 2017-01-07 0.6 0.1 1 0.6 0.2 1 NA NA NA
# 8 2017-01-08 0.9 0.9 1 0.9 1.0 1 NA NA NA
# 9 2017-01-09 0.1 0.7 1 NA NA NA 0.2 0.8 1
# 10 2017-01-10 0.2 0.6 1 0.3 0.6 1 0.3 0.7 1
数据强>
df <- read.table(text = "Date x01 y01 a01 x02 y02 a02 x03 y03 a03
1 '2017-01-01' 0.6 0.5 1 0.7 0.5 0 0.8 0.6 1
2 '2017-01-02' 0.9 0.6 1 1.0 0.7 1 1.0 0.7 1
3 '2017-01-03' 0.1 0.2 1 0.2 0.2 0 0.3 0.2 1
4 '2017-01-04' 0.2 0.6 1 0.2 0.6 1 0.3 0.7 1
5 '2017-01-05' 0.4 0.3 1 0.5 0.3 1 0.6 0.4 1
6 '2017-01-06' 0.6 0.3 1 0.6 0.3 1 0.7 0.4 1
7 '2017-01-07' 0.6 0.1 1 0.6 0.2 1 0.6 0.2 0
8 '2017-01-08' 0.9 0.9 1 0.9 1.0 1 1.0 1.0 0
9 '2017-01-09' 0.1 0.7 1 0.2 0.7 0 0.2 0.8 1
10 '2017-01-10' 0.2 0.6 1 0.3 0.6 1 0.3 0.7 1",
header = TRUE, stringsAsFactors = FALSE)