根据其他列条件将列值更改为NA

时间:2018-02-03 18:00:53

标签: r dataframe

我有一个包含9个变量的df。

    > df
         Date x01 y01 a01 x02 y02 a02 x03 y03 a03
1  2017-01-01 0.6 0.5   1 0.7 0.5   0 0.8 0.6   1
2  2017-01-02 0.9 0.6   1 1.0 0.7   1 1.0 0.7   1
3  2017-01-03 0.1 0.2   1 0.2 0.2   0 0.3 0.2   1
4  2017-01-04 0.2 0.6   1 0.2 0.6   1 0.3 0.7   1
5  2017-01-05 0.4 0.3   1 0.5 0.3   1 0.6 0.4   1
6  2017-01-06 0.6 0.3   1 0.6 0.3   1 0.7 0.4   1
7  2017-01-07 0.6 0.1   1 0.6 0.2   1 0.6 0.2   0
8  2017-01-08 0.9 0.9   1 0.9 1.0   1 1.0 1.0   0
9  2017-01-09 0.1 0.7   1 0.2 0.7   0 0.2 0.8   1
10 2017-01-10 0.2 0.6   1 0.3 0.6   1 0.3 0.7   1  

当相同编号的'a'变量不是1时,我想用NA替换'x','y'的值。所以结果将如下所示

     Date x01 y01 a01 x02 y02 a02 x03 y03 a03
1  2017-01-01 0.6 0.5   1  NA  NA  NA 0.8 0.6   1
2  2017-01-02 0.9 0.6   1 1.0 0.7   1 1.0 0.7   1
3  2017-01-03 0.1 0.2   1  NA  NA  NA 0.3 0.2   1
4  2017-01-04 0.2 0.6   1 0.2 0.6   1 0.3 0.7   1
5  2017-01-05 0.4 0.3   1 0.5 0.3   1 0.6 0.4   1
6  2017-01-06 0.6 0.3   1 0.6 0.3   1 0.7 0.4   1
7  2017-01-07 0.6 0.1   1 0.6 0.2   1  NA  NA  NA
8  2017-01-08 0.9 0.9   1 0.9 1.0   1  NA  NA  NA
9  2017-01-09 0.1 0.7   1  NA  NA  NA 0.2 0.8   1
10 2017-01-10 0.2 0.6   1 0.3 0.6   1 0.3 0.7   1

我已成功使用以下代码执行此操作。

mynames=unique(str_sub(names(df),2,3))[-1]
index<-lapply(mynames,function(x) str_detect(names(df),paste0(c("Date",x),collapse="|")))
dataList<-lapply(index, function(x) setNames(df[,x],nm=c("Date","V1","V2","A")))
subList<-lapply(dataList,function(x) filter(x,A>0.999))
df0=join_all(subList,by="Date")

我想知道是否有更优雅的方式

构建df的代码是

n=10
x01=round(runif(n),1)
x02=round((x01+runif(n)/10),1)
x03=round((x02+runif(n)/10),1)
y01=round(runif(n),1)
y02=round((y01+runif(n)/10),1)
y03=round((y02+runif(n)/10),1)
a01=rbinom(n,1,0.8)
a02=rbinom(n,1,0.8)
a03=rbinom(n,1,0.8)
Date=seq(ymd("2017-01-01"),ymd("2017-01-10"),by="day")
df=data.frame(Date,x01,x02,x03,y01,y02,y03,a01,a02,a03)
非常感谢

3 个答案:

答案 0 :(得分:3)

将输出dfout设置为df输入,然后确定x,y和列的列数(xcolsycols,{ {1}})。然后,对于那些设置中的每一个,相应的acols值不是1到NA的那些元素。

a

,并提供:

dfout <- df

xcols <- grep("^x", names(df))
ycols <- grep("^y", names(df))
acols <- grep("^a", names(df))

dfout[xcols][df[acols] != 1] <- NA
dfout[ycols][df[acols] != 1] <- NA
dfout[acols][df[acols] != 1] <- NA

dfout

注意

可重现形式的输入 Date x01 y01 a01 x02 y02 a02 x03 y03 a03 1 2017-01-01 0.6 0.5 1 NA NA NA 0.8 0.6 1 2 2017-01-02 0.9 0.6 1 1.0 0.7 1 1.0 0.7 1 3 2017-01-03 0.1 0.2 1 NA NA NA 0.3 0.2 1 4 2017-01-04 0.2 0.6 1 0.2 0.6 1 0.3 0.7 1 5 2017-01-05 0.4 0.3 1 0.5 0.3 1 0.6 0.4 1 6 2017-01-06 0.6 0.3 1 0.6 0.3 1 0.7 0.4 1 7 2017-01-07 0.6 0.1 1 0.6 0.2 1 NA NA NA 8 2017-01-08 0.9 0.9 1 0.9 1.0 1 NA NA NA 9 2017-01-09 0.1 0.7 1 NA NA NA 0.2 0.8 1 10 2017-01-10 0.2 0.6 1 0.3 0.6 1 0.3 0.7 1 是:

df

答案 1 :(得分:2)

&#34;旧学校&#34;溶液

使用grep根据字母x,y获取列号    一个。

df.names <- names(df)
a.cols <- grep('^a', df.names)
x.cols <- grep('^x', df.names)
y.cols <- grep('^y', df.names)

对于每个&#39; a&#39;列,索引&#39; x&#39;并且&#39; y&#39; a列值不等于1的列,并将它们设置为NA

# for each a column, modify the corresponding x and y   
for (i in 1:length(a.cols)) {
    # get indexes of non-1 entries in 'a' cols
    a.index <- df[,a.cols[i]]!=1
    # change the corresponding entries in 'x' and 'y' cols
    df[,x.cols[i]][a.index] = NA 
    df[,y.cols[i]][a.index] = NA 
}

答案 2 :(得分:1)

使用的解决方案。它需要多个gatherspread才能处理数据。

library(dplyr)
library(tidyr)

df2 <- df %>%
  gather(Cols, Values, -Date) %>% 
  extract(Cols, into = c("Letter", "Number"), regex = "([A-Za-z])([0-9]*)") %>%
  spread(Letter, Values) %>%
  mutate(a = ifelse(a != 1, NA, a)) %>%
  mutate_at(vars(x, y), funs(ifelse(is.na(a), NA, .))) %>%
  gather(Letter, Values, -Date, -Number) %>%
  unite(Cols, Letter, Number, sep = "") %>%
  spread(Cols, Values) %>%
  select(names(df))
df2
#          Date x01 y01 a01 x02 y02 a02 x03 y03 a03
# 1  2017-01-01 0.6 0.5   1  NA  NA  NA 0.8 0.6   1
# 2  2017-01-02 0.9 0.6   1 1.0 0.7   1 1.0 0.7   1
# 3  2017-01-03 0.1 0.2   1  NA  NA  NA 0.3 0.2   1
# 4  2017-01-04 0.2 0.6   1 0.2 0.6   1 0.3 0.7   1
# 5  2017-01-05 0.4 0.3   1 0.5 0.3   1 0.6 0.4   1
# 6  2017-01-06 0.6 0.3   1 0.6 0.3   1 0.7 0.4   1
# 7  2017-01-07 0.6 0.1   1 0.6 0.2   1  NA  NA  NA
# 8  2017-01-08 0.9 0.9   1 0.9 1.0   1  NA  NA  NA
# 9  2017-01-09 0.1 0.7   1  NA  NA  NA 0.2 0.8   1
# 10 2017-01-10 0.2 0.6   1 0.3 0.6   1 0.3 0.7   1

数据

df <- read.table(text = "Date x01 y01 a01 x02 y02 a02 x03 y03 a03
                 1  '2017-01-01' 0.6 0.5   1 0.7 0.5   0 0.8 0.6   1
                 2  '2017-01-02' 0.9 0.6   1 1.0 0.7   1 1.0 0.7   1
                 3  '2017-01-03' 0.1 0.2   1 0.2 0.2   0 0.3 0.2   1
                 4  '2017-01-04' 0.2 0.6   1 0.2 0.6   1 0.3 0.7   1
                 5  '2017-01-05' 0.4 0.3   1 0.5 0.3   1 0.6 0.4   1
                 6  '2017-01-06' 0.6 0.3   1 0.6 0.3   1 0.7 0.4   1
                 7  '2017-01-07' 0.6 0.1   1 0.6 0.2   1 0.6 0.2   0
                 8  '2017-01-08' 0.9 0.9   1 0.9 1.0   1 1.0 1.0   0
                 9  '2017-01-09' 0.1 0.7   1 0.2 0.7   0 0.2 0.8   1
                 10 '2017-01-10' 0.2 0.6   1 0.3 0.6   1 0.3 0.7   1",
                 header = TRUE, stringsAsFactors = FALSE)