在R中创建具有特定条件的列

时间:2017-07-19 04:04:32

标签: r loops if-statement lapply create-table

我有这些数据。

OPENING CLOSE 
2007     2008   
2009     2010    
2004      NA   

我想写这个专栏

OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010
2007     2008                     1     1
2005     2008         1     1     1     1                                   
2004      NA    1     1     1     1     1     1     1

可以使用if函数逐步创建此列,而我想创建循环或lapply函数。

此外,我想使用某个条件制作此列(S~)。

如果列(Y2007)为1且3年前的列为1(Y2005),  新列(S2007)为1,否则为0。

OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 | S2007 S2008 S2009
2007     2008                     1     1               |   0     0     0
2005     2008         1     1     1     1               |   1     1     0
2004      NA    1     1     1     1     1     1     1   |   1     1     1

如何在R中创建脚本?

2 个答案:

答案 0 :(得分:1)

来自tidyverse的解决方案。 dt3是第一个期望的输出,而dt5是第二个期望的输出。这里没有必要使用loops

# Create example data frame
dt <- read.table(text = "OPENING CLOSE 
2007     2008   
                 2005     2008    
                 2004      NA   ",
                 header = TRUE, stringsAsFactors = FALSE)

# Load package
library(tidyverse)

dt2 <- dt %>%
  mutate(ID = 1:n(), EndYear = ifelse(is.na(CLOSE), 2010, CLOSE)) %>%
  # Create year range list
  mutate(YearRange = map2(OPENING, EndYear, `:`)) %>%
  # Unnest the list column
  unnest() %>%
  mutate(YearRange = paste0("Y", YearRange)) %>%
  mutate(Value = 1) %>%
  # Spread based on YearRange and Value
  spread(YearRange, Value)

# Desired output 1  
dt3 <- dt2 %>%  
  arrange(ID) %>%
  select(-ID, -EndYear)

dt4 <- dt2 %>%
  gather(YearRange, Value, Y2004:Y2010) %>%
  arrange(ID) %>%
  group_by(ID) %>%
  # Set the lag year here, using 3 years ago as an example
  mutate(Value2 = lag(Value, 2)) %>%
  # Evaluate the condition bewteen one year and 3 years ago
  mutate(Value3 = ifelse(Value %in% 1 & Value2 %in% 1, 1, 0)) %>%
  mutate(YearRange = sub("Y", "S", YearRange)) %>%
  select(ID, YearRange, Value3) %>%
  # Filter for S2007 o S2009
  filter(YearRange %in% paste0("S", 2007:2009)) %>%
  spread(YearRange, Value3)

# Desired output 2
dt5 <- dt2 %>%
  left_join(dt4, by = "ID") %>%
  arrange(ID) %>%
  select(-ID, -EndYear)

答案 1 :(得分:0)

Base R版本:

rng <- range(unlist(dat), na.rm=TRUE)
rng <- rng[1]:rng[2]

dat[paste0("Y",rng)] <- t(mapply(
  function(op,cl,rn) rn >= op & (rn <= cl | is.na(cl)),
  dat[["OPENING"]],
  dat[["CLOSE"]],
  list(rng)
))

#  OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010
#1    2007  2008 FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
#2    2009  2010 FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
#3    2004    NA  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE