用条件替换上一行中的值

时间:2018-12-03 12:41:28

标签: r substring str-replace data-manipulation

我想获取ID列不以00开头的数据,并将ID列的此值附加到上一行的Description列的末尾。

然后将其余值替换为上一行的“名称”列之后。我如何用R做到这一点?

以下是伪数据的来源:https://docs.google.com/spreadsheets/d/1SbmaM8hXck-z5nsNfDMbhwijvAGPkPPBgQ_eY4JAMC8/edit?usp=sharing

ID      Year    Description  Name   User       Factor_1  Factor_2   Factor_3
0011    2016    blue colour  AA     James      Xfac      NA         NA
is nice XXX     XLM          Yfac   different  Yfac      NA         NA
0024    2017    red colour   DD     Mark       Zfac      NA         NA
is good YYY     STM          Lfac   unique     Zfac      NA         NA

我想要拥有的东西:

ID      Year    Description          Name   User  Factor_1   Factor_2   Factor_3
0011    2016    blue colour is nice  XXX    XLM   Yfac       different  Yfac
0024    2017    red colour is good   YYY    STM   Lfac       unique     Zfac

3 个答案:

答案 0 :(得分:1)

这是dplyr的解决方案:

library(dplyr)

df %>% 
  bind_cols(df %>% rename_all(function(x) paste0(x, "_dummy"))) %>%
  mutate(
    Description = ifelse(substr(lead(ID), 1, 2) != "00", 
                         paste(Description, lead(ID)), Description),
    Name = lead(Year_dummy),
    User = lead(Description_dummy),
    Factor_1 = lead(Name_dummy),
    Factor_2 = lead(User_dummy),
    Factor_3 = lead(Factor_1_dummy)
  ) %>% select(-ends_with("dummy")) %>%
  filter(substr(ID, 1, 2) == "00")

输出:

    ID Year       Description Name User Factor_1  Factor_2 Factor_3
1 0011 2016 blue colour is nice  XXX  XLM     Yfac different     Yfac
2 0024 2017  red colour is good  YYY  STM     Lfac    unique     Zfac

如果您要处理大量列,则dplyrbase R的组合可以做到这一点:

library(dplyr)

df_combo <- cbind(df, df)

df$Description <- ifelse(substr(lead(df$ID), 1, 2) != "00", 
                               paste(df$Description, lead(df$ID)), df$Description)

for (i in (ncol(df) + 4):ncol(df_combo)) {

  df_combo[[i]] <- lead(df_combo[[i - ncol(df) - 2]])

}

df_combo <- subset(df_combo, substr(ID, 1, 2) == "00")

df_descr <- subset(df, substr(ID, 1, 2) == "00")

df_final <- df_combo[, (ncol(df) + 1):ncol(df_combo)]

df_final$Description <- df_descr$Description

rm(df_descr, df_combo)

输出:

     ID Year       Description Name User Factor_1  Factor_2 Factor_3
1: 0011 2016 blue colour is nice  XXX  XLM     Yfac different     Yfac
2: 0024 2017  red colour is good  YYY  STM     Lfac    unique     Zfac

答案 1 :(得分:1)

在第一部分中,您需要将描述粘贴在一起,
在第二部分中,您也要移动变量,因为您希望在“用户”列中输入“ XXX”和“ YYY”

此外,在Viveks答案中,所有错误的行都粘贴有所有“正确”行,这在您的示例中有效,但如果您有几条正确的行,然后是错误的行,则不会。 使用布尔值(TRUE / FALSE)有时可以很好地工作,但是在这种情况下,我认为您想使用整数索引,因为这样可以更轻松地引用“上一行”。这给了我代码:

rmlines <- which(!substr(df$ID,1,2)=="00")
df$Description[rmlines-1] <- paste(df$Description[rmlines-1], df[rmlines,1], sep=" ")
df[rmlines-1, 4:8] <- df[rmlines, 2:6]
df <- df[-rmlines,]

但是还有一个问题要考虑:您的列是什么类?
当我尝试时,我将所有内容都视为一个字符,这意味着您可以在任意位置移动列。在您的数据中,某些因素可能是因素或其他因素,因此您可能要更改类。我认为最简单的方法是先将其全部更改为字符,然后再将其更改回您希望列为最终的类。

# To change everything to character:
df <- as.data.frame(lapply(df, as.character), stringsAsFactors = FALSE)
# And to assign the right classes, you need to decide case-by-case:
df$Year <- as.integer(df$Year)
df$Factor_1 <- as.factor(df$Factor1) # Optionally provide levels

答案 2 :(得分:0)

Use -

bools <- !substr(df$ID,1,2)=="00"
values <- df[bools,1]
df <- df[!bools,]
df$Description <- paste(df[substr(df$ID,1,2)=="00","Description"],values,sep=" ")
df

Output

    ID Year         Description Name  User Factor_1 Factor_2
1 0011 2016 blue colour is nice   AA James     Xfac       NA
3 0024 2017  red colour is good   DD  Mark     Zfac       NA
  Factor_3
1       NA
3       NA