根据R中其他列的更改,向数据框添加一列

时间:2013-11-28 13:09:35

标签: r

我的数据框如下所示。 Build用于改变每周twise或 一旦。每当它发生变化时,都需要在ggplot中进行识别 图表通过添加一个散点图(让我知道是否有更好的想法) 使用相同的x轴(日期)。

为此,我想在此数据框中再添加一列。

              date                                  build Runtime
    2   2013-07-16                  build-2013-07-09-1332 672.918
    4   2013-07-17                  build-2013-07-15-0510 696.924
    6   2013-07-18                  build-2013-07-15-0510 736.720
    8   2013-07-19                  build-2013-07-18-1644 693.206
    10  2013-07-20                  build-2013-07-18-1644 699.332
    12  2013-07-24                  build-2013-07-22-0510 712.388
    14  2013-07-25                  build-2013-07-22-0510 711.573
    16  2013-07-26                  build-2013-07-22-0510 715.223
    18  2013-07-27                  build-2013-07-22-0510 715.180
    20  2013-07-31                  build-2013-07-29-0510 717.888
    22  2013-08-01                  build-2013-07-29-0510 716.315
    24  2013-08-02                  build-2013-07-29-0510 719.216
    26  2013-08-03                  build-2013-07-29-0510 716.073
    28  2013-08-07                  build-2013-08-05-0510 717.566

添加了另一个名为BuildChange的列,如下所示。使用awk命令来做 同样的。

    cat q.txt | awk 'BEGIN{CBD=""}{if($3 != CDB){print $2","$3","$4","1}else{print $2","$3","$4","0}CDB=$3;}'


              date                                  build Runtime BuildChange
    2   2013-07-16                  build-2013-07-09-1332 672.918 5
    4   2013-07-17                  build-2013-07-15-0510 696.924 5
    6   2013-07-18                  build-2013-07-15-0510 736.720
    8   2013-07-19                  build-2013-07-18-1644 693.206 5
    10  2013-07-20                  build-2013-07-18-1644 699.332
    12  2013-07-24                  build-2013-07-22-0510 712.388 5
    14  2013-07-25                  build-2013-07-22-0510 711.573
    16  2013-07-26                  build-2013-07-22-0510 715.223
    18  2013-07-27                  build-2013-07-22-0510 715.180
    20  2013-07-31                  build-2013-07-29-0510 717.888 5
    22  2013-08-01                  build-2013-07-29-0510 716.315
    24  2013-08-02                  build-2013-07-29-0510 719.216
    26  2013-08-03                  build-2013-07-29-0510 716.073
    28  2013-08-07                  build-2013-08-05-0510 717.566 5

我想在for循环中做同样的事情。是否有更好的想法添加一个 更多列并在图表中显示构建的更改。

结果图但我想要没有上轴和右轴

Result graph but I want without above and right axis

我的数据框的dput()

    structure(list(date = structure(1:28, .Label = c("2013-07-16",
    "2013-07-17", "2013-07-18", "2013-07-19", "2013-07-20", "2013-07-24",
    "2013-07-25", "2013-07-26", "2013-07-27", "2013-07-31", "2013-08-01",
    "2013-08-02", "2013-08-03", "2013-08-07", "2013-08-08", "2013-08-09",
    "2013-08-10", "2013-08-14", "2013-08-15", "2013-08-16", "2013-08-17",
    "2013-08-21", "2013-08-22", "2013-08-23", "2013-08-24", "2013-08-28",
    "2013-08-29", "2013-08-30", "2013-08-31", "2013-09-04", "2013-09-05",
    "2013-09-06", "2013-09-07", "2013-09-11", "2013-09-12", "2013-09-13",
    "2013-09-18", "2013-09-19", "2013-09-20", "2013-09-21", "2013-09-25",
    "2013-09-26", "2013-09-27", "2013-09-28", "2013-10-02", "2013-10-03",
    "2013-10-04", "2013-10-05", "2013-10-09", "2013-10-10", "2013-10-11",
    "2013-10-12", "2013-10-16", "2013-10-17", "2013-10-18", "2013-10-19",
    "2013-10-23", "2013-10-24", "2013-10-25", "2013-10-26", "2013-10-30",
    "2013-10-31", "2013-11-01", "2013-11-02", "2013-11-06", "2013-11-07",
    "2013-11-08", "2013-11-09"), class = "factor"), build = structure(c(1L,
    2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
    7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L), .Label = c("build-2013-07-09-1332",
    "build-2013-07-15-0510", "build-2013-07-18-1644",
    "build-2013-07-22-0510", "build-2013-07-29-0510",
    "build-2013-08-05-0510", "build-2013-08-13-1329",
    "build-2013-08-20-0510", "build-2013-08-27-0510",
    "build-2013-09-03-1340", "build-2013-09-10-1326",
    "build-2013-09-17-0510", "build-2013-09-26-0510",
    "build-2013-10-08-1359", "build-2013-10-14-0510",
    "build-2013-10-18-1437", "build-2013-10-18-1437-PLUS-11259-11737",
    "build-2013-10-28-0510", "build-2013-11-04-0510"
    ), class = "factor"), Runtime = c(672.918, 696.924, 736.72, 693.206,
    699.332, 712.388, 711.573, 715.223, 715.18, 717.888, 716.315,
    719.216, 716.073, 717.566, 723.644, 720.374, 726.145, 710.658,
    715.002, 718.742, 727.297, 711.684, 714.743, 715.815, 726.467,
    742.33, 746.352, 749.55)), .Names = c("date", "build", "Runtime"
    ), row.names = c(2L, 4L, 6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L,
    22L, 24L, 26L, 28L, 30L, 32L, 34L, 36L, 38L, 40L, 42L, 44L, 46L,
    48L, 50L, 52L, 54L, 55L), class = "data.frame")

4 个答案:

答案 0 :(得分:1)

这个怎么样 - square显示该构建的平均运行时间。请注意不需要新列。

require(plyr)
require(ggplot2)

df1$date<-(as.Date(df1$date))

ggplot(data=df1)+
  geom_line(aes(date,Runtime))+
  geom_point(data=ddply(df1,.(build),summarize,firstdate=min(date),avruntime=mean(Runtime)),
         aes(firstdate,avruntime),
         shape=22,
         size=5,
         fill="red")

enter image description here

答案 1 :(得分:1)

如果df是您的数据框,那么这些内容应该可以帮助您入门。

library(ggplot2)

# identify change in build
df$buildchange <- c(1,as.integer(diff(df$build))
df[df$buildchange==0,"buildchange"]=NA

#plot
p1 <- ggplot(
  data = df,
  aes(
    x = date)) + 
  geom_line(
    aes(
      y = Runtime,
      group = 1,
      colour = "Runtime")
    ) +
  geom_point(
    aes(
      y = Runtime*buildchange,
      size = 5,
      colour = "Build Change")
  ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

enter image description here

答案 2 :(得分:1)

每次发布​​新版本时,我都会勾选X轴。从原始图中,您可以提取y的最小值,并将所有刻度的y轴设置为此最小值,因此此代码是通用的,并在运行时降至当前低点以下时继续工作。将原始数据放入Dat这是代码enter image description here

p <-  ggplot(Dat, aes(date, Runtime)) + geom_line()
buildElements <- strsplit(as.character(Dat$build), split = "-")
pasteBE <- function(x)  paste(x[2],x[3],x[4], sep = "-")
Dat2 <- data.frame(
  newBuild = as.Date(unique(sapply(buildElements, pasteBE))),
  yMin = ggplot_build(p)$panel$ranges[[1]]$y.range[1])
p + geom_point(data = Dat2, aes(newBuild, yMin), col = "red", size = 2) 

答案 3 :(得分:0)

别介意dput()它没有构建更改列。

我想你想要这样的东西? buildChange数据导入为NA,因此不显示为点。然后我只计算最大运行时间并轻推图形上方的点,以便y轴可以很好地缩放。哈基但很好......这是午餐时间。

# I added an id identifier to the first column of q.txt
DF= read.table('q.txt', header=T, fill=T, colClasses=c('numeric', 'Date', 'factor', 'numeric', 'numeric'))
library(ggplot2)
theme_set(theme_bw())
m=max(DF$Runtime)
qplot(date,Runtime, geom='line',data=DF)+geom_point(aes(x=date, y=BuildChange+m+5))

给了我:

enter image description here