更新R中的列值并删除行

时间:2017-07-03 11:51:40

标签: r

我正在读取csv文件并尝试根据条件更新名为'added'的列的值,当两个连续行的bug_id和bug_when相同并且第i行的添加列具有值“RESOLVED”然后添加列的值( i + 1)通过连接'added'列(i和i + 1行)的值来更新行并删除第i行。我累了,但它没有正常工作。该文件包含以下信息:

bug_id  bug_when            field       added
1141327 2015-03-09 16:21:30 Status      RESOLVED
1141327 2015-03-09 16:21:30 Resolution  DUPLICATE
1142623 2015-03-24 18:15:22 Status      RESOLVED
1142623 2015-03-24 18:15:22 Resolution  FIXED
1143179 2015-07-30 09:37:56 Status      RESOLVED
1143179 2015-07-30 09:37:56 Resolution  FIXED

这是我的代码:

dataframe <- read.csv("prototype.csv", header = TRUE)
start <- 1
end <- nrow(dataframe)-1

for(i in start:end)
{
  if(dataframe$bug_id[i]==dataframe$bug_id[i+1] & dataframe$bug_when[i]==dataframe$bug_when[i+1])
  {
    if(dataframe$added[i]=="RESOLVED")
    {
      df <- paste(dataframe$added[i],"-",dataframe$added[i+1])
      dataframe$added[i+1] <- df
      dataframe <- dataframe[!(dataframe[i,])]
    }

  }

}

任何建议都将受到高度赞赏。 期望的结果:

bug_id  bug_when            field       added
1141327 2015-03-09 16:21:30 Resolution  RESOLVED-DuPLICATE
1142623 2015-03-24 18:15:22 Resolution  RESOLVED-FIXED
1143179 2015-07-30 09:37:56 Resolution  RESOLVED-FIXED

2 个答案:

答案 0 :(得分:0)

以下是dplyr如何做到这一点。基本上,每次都有&#34;决定&#34;在添加的t-1中,添加的字符串与paste连接。然后使用filter仅使字段保留&#34;分辨率&#34;。

library(dplyr)
df%>%
  group_by(bug_id,bug_when)%>%
  mutate(added=ifelse(lag(added) =="RESOLVED" & !is.na(lag(added)),
                  paste(lag(added),(added),sep="-"),
                  added))%>%
  filter(field=="Resolution")

   bug_id            bug_when      field              added
    <int>               <chr>      <chr>              <chr>
1 1141327 2015-03-09 16:21:30 Resolution RESOLVED-DUPLICATE
2 1142623 2015-03-24 18:15:22 Resolution     RESOLVED-FIXED
3 1143179 2015-07-30 09:37:56 Resolution     RESOLVED-FIXED

数据

df <- read.table(text="bug_id  bug_when            field       added
1141327 '2015-03-09 16:21:30' Status      RESOLVED
1141327 '2015-03-09 16:21:30' Resolution  DUPLICATE
1142623 '2015-03-24 18:15:22' Status      RESOLVED
1142623 '2015-03-24 18:15:22' Resolution  FIXED
1143179 '2015-07-30 09:37:56' Status      RESOLVED
1143179 '2015-07-30 09:37:56' Resolution  FIXED",
                 header=TRUE,stringsAsFactors=FALSE)

答案 1 :(得分:0)

我认为你想要将聚合和粘贴结合起来,如下所示:

df <- read.table(text="bug_id  bug_when            field       added
1141327 '2015-03-09 16:21:30' Status      RESOLVED
1141327 '2015-03-09 16:21:30' Resolution  DUPLICATE
1142623 '2015-03-24 18:15:22' Status      RESOLVED
1142623 '2015-03-24 18:15:22' Resolution  FIXED
1143179 '2015-07-30 09:37:56' Status      RESOLVED
1143179 '2015-07-30 09:37:56' Resolution  FIXED",stringsAsFactors = FALSE,header=TRUE)

df2 <- aggregate(added ~ bug_id + bug_when, df,paste,collapse = "-")
df2$field <- "Resolution"

#    bug_id            bug_when              added      field
# 1 1141327 2015-03-09 16:21:30 RESOLVED-DUPLICATE Resolution
# 2 1142623 2015-03-24 18:15:22     RESOLVED-FIXED Resolution
# 3 1143179 2015-07-30 09:37:56     RESOLVED-FIXED Resolution