我正在读取csv文件并尝试根据条件更新名为'added'的列的值,当两个连续行的bug_id和bug_when相同并且第i行的添加列具有值“RESOLVED”然后添加列的值( i + 1)通过连接'added'列(i和i + 1行)的值来更新行并删除第i行。我累了,但它没有正常工作。该文件包含以下信息:
bug_id bug_when field added
1141327 2015-03-09 16:21:30 Status RESOLVED
1141327 2015-03-09 16:21:30 Resolution DUPLICATE
1142623 2015-03-24 18:15:22 Status RESOLVED
1142623 2015-03-24 18:15:22 Resolution FIXED
1143179 2015-07-30 09:37:56 Status RESOLVED
1143179 2015-07-30 09:37:56 Resolution FIXED
这是我的代码:
dataframe <- read.csv("prototype.csv", header = TRUE)
start <- 1
end <- nrow(dataframe)-1
for(i in start:end)
{
if(dataframe$bug_id[i]==dataframe$bug_id[i+1] & dataframe$bug_when[i]==dataframe$bug_when[i+1])
{
if(dataframe$added[i]=="RESOLVED")
{
df <- paste(dataframe$added[i],"-",dataframe$added[i+1])
dataframe$added[i+1] <- df
dataframe <- dataframe[!(dataframe[i,])]
}
}
}
任何建议都将受到高度赞赏。 期望的结果:
bug_id bug_when field added
1141327 2015-03-09 16:21:30 Resolution RESOLVED-DuPLICATE
1142623 2015-03-24 18:15:22 Resolution RESOLVED-FIXED
1143179 2015-07-30 09:37:56 Resolution RESOLVED-FIXED
答案 0 :(得分:0)
以下是dplyr
如何做到这一点。基本上,每次都有&#34;决定&#34;在添加的t-1中,添加的字符串与paste
连接。然后使用filter
仅使字段保留&#34;分辨率&#34;。
library(dplyr)
df%>%
group_by(bug_id,bug_when)%>%
mutate(added=ifelse(lag(added) =="RESOLVED" & !is.na(lag(added)),
paste(lag(added),(added),sep="-"),
added))%>%
filter(field=="Resolution")
bug_id bug_when field added
<int> <chr> <chr> <chr>
1 1141327 2015-03-09 16:21:30 Resolution RESOLVED-DUPLICATE
2 1142623 2015-03-24 18:15:22 Resolution RESOLVED-FIXED
3 1143179 2015-07-30 09:37:56 Resolution RESOLVED-FIXED
数据强>
df <- read.table(text="bug_id bug_when field added
1141327 '2015-03-09 16:21:30' Status RESOLVED
1141327 '2015-03-09 16:21:30' Resolution DUPLICATE
1142623 '2015-03-24 18:15:22' Status RESOLVED
1142623 '2015-03-24 18:15:22' Resolution FIXED
1143179 '2015-07-30 09:37:56' Status RESOLVED
1143179 '2015-07-30 09:37:56' Resolution FIXED",
header=TRUE,stringsAsFactors=FALSE)
答案 1 :(得分:0)
我认为你想要将聚合和粘贴结合起来,如下所示:
df <- read.table(text="bug_id bug_when field added
1141327 '2015-03-09 16:21:30' Status RESOLVED
1141327 '2015-03-09 16:21:30' Resolution DUPLICATE
1142623 '2015-03-24 18:15:22' Status RESOLVED
1142623 '2015-03-24 18:15:22' Resolution FIXED
1143179 '2015-07-30 09:37:56' Status RESOLVED
1143179 '2015-07-30 09:37:56' Resolution FIXED",stringsAsFactors = FALSE,header=TRUE)
df2 <- aggregate(added ~ bug_id + bug_when, df,paste,collapse = "-")
df2$field <- "Resolution"
# bug_id bug_when added field
# 1 1141327 2015-03-09 16:21:30 RESOLVED-DUPLICATE Resolution
# 2 1142623 2015-03-24 18:15:22 RESOLVED-FIXED Resolution
# 3 1143179 2015-07-30 09:37:56 RESOLVED-FIXED Resolution