为什么我的geom_lines无法突破正确的颜色?

时间:2015-05-18 15:57:27

标签: r ggplot2 linegraph

我遇到了一个关于使用geom_line()函数的小问题。 我的数据包括由经过培训的观察者对某些行为进行逐帧手动视频评估,这导致每个观察者有数千个数据点。这基本上是每个观察者由0和1组成的向量,其中1代表想要的行为和0不需要的行为。

到处玩,我想出了以下内容:

# a dataset from a manual videoanalysis with frame by frame behaviour assessment in binary. 0 = no, 1 = yes.
data1<-read.csv("ObserversBehaviour.csv", ",", header=T)

# my solution of giving each observer his own line, without having to transform the entire set
Obsy0 <- rep(0,4528)
Obsy1 <- rep(1,4528)
Obsy2 <- rep(2,4528)
Obsy3 <- rep(3,4528)
Obsy4 <- rep(4,4528)
Obsy5 <- rep(5,4528)
Obsy6 <- rep(6,4528)
Obsy7 <- rep(7,4528)
Obsy8 <- rep(8,4528)
Obsy9 <- rep(9,4528)
Obsy10 <- rep(10,4528)

ObsData <- data.frame(data1,Obsy0,Obsy1,Obsy2,Obsy3,Obsy4,Obsy5,Obsy6,Obsy7,Obsy8,Obsy9,Obsy10)

#vector giving each observer a number
Obsall <- c(0:10)

#The list of individual frames of video M01 (4528 in total)
Framerange <- ObsData[["Frames.M01"]]

ylabels <- c("Observer0","Observer1","Observer2","Observer3","Observer4","Observer5","Observer6","Observer7","Observer8","Observer9","Observer10")

#Ob<n>value is the 1 or 0 assessment
#had to use as.factor() because for some reason my 0s and 1s are seen as continuous
GraphObserve <-ggplot(ObsData,ylim=range(Obsall),xlim=max(Framerange),aes(x=Framerange))
geom_point(aes(x=Frames.M01, y = Obsy0, colour = as.factor(Ob0value), size=as.factor(Ob0value)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy1, colour = as.factor(Ob1value), size=as.factor(Ob1value)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy2, colour = as.factor(Ob2value), size=as.factor(Ob2value)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy3, colour = as.factor(Ob3value), size=as.factor(Ob3value)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy4, colour = as.factor(Ob4freeze.0.no.1.yes), size=as.factor(Ob4freeze.0.no.1.yes)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy5, colour = as.factor(Ob5value), size=as.factor(Ob5value)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy6, colour = as.factor(Ob6value), size=as.factor(Ob6value)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy7, colour = as.factor(Ob7value), size=as.factor(Ob7value)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy8, colour = as.factor(Ob8value), size=as.factor(Ob8value)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy9, colour = as.factor(Ob9value), size=as.factor(Ob9value)), shape=15) +
geom_point(aes(x=Frames.M01, y = Obsy10, colour = as.factor(Ob10value), size=as.factor(Ob10value)), shape=15) +

scale_colour_manual(breaks = c(0, 1),
  labels = c("No","Yes"),
  values = c("green4","red"),
  name="Assessment")+
#needed to let the wanted behaviour stand out, so I changed pointsize
scale_size_manual(breaks = c(0, 1), values=c(1,2), guide="none")+
scale_y_discrete(limit=Obsall, labels=ylabels, expand=c(0,0))+
scale_x_continuous(expand=c(0,0),breaks = round(seq(min(0), max(Framerange), by = 200),5000))+
expand_limits(y=c(1,-.5))

update_labels(GraphObserve,list(x="Frames (M01)",y ="Observers"))

这让我得到了一个公平的图形,每个数据点都有漂亮的彩色圆点,但由于这些点重叠并且仍然很小,所以这不是我要走的路。我使用geom_point()而不是geom_line()。该图表确实代表了我想要的每个颜色中断。

接下来,我将每个geom_point()行更改为geom_line(),同时保持其余部分相同。 (scale_size_manual()变得非常多余)

geom_line(aes(x=Framerange, y=Obsy0, colour=as.factor(Ob0value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy1, colour=as.factor(Ob1value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy2, colour=as.factor(Ob2value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy3, colour=as.factor(Ob3value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy4, colour=as.factor(Ob4value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy5, colour=as.factor(Ob5value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy6, colour=as.factor(Ob6value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy7, colour=as.factor(Ob7value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy8, colour=as.factor(Ob8value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy9, colour=as.factor(Ob9value)),size=14) +
geom_line(aes(x=Framerange, y=Obsy10, colour=as.factor(Ob10value)),size=14) +

我认为这样做会很好,但事实并非如此。

而不是为文件中的每个0和1切换颜色,似乎颜色在数据集中的第一个和最后一个处切换。

以上脚本中的图表:http://imgur.com/2baseCa,bJa2Ab7#0

我似乎无法在代码中找到错误,也似乎无法在网络上找到解决方案。这里有谁可以帮我解决这个问题?

更新

为了更清晰地概述,我将链接放在我们之前的文章下面的结果图中。

建议将我的数据放入&#34; long&#34;格式,我使用了以下脚本:

data1<-read.csv("ObserversBehaviour.csv", ",", header=T)

Frames<-data1[["Frames.M01"]]
Obs<-paste0("Observer",0:10)
Obsy <- sort(rep(0:10,4528),decreasing=F)
Obsvalue <- stack(data1[,c(Obs)])
ObsData2 <- expand.grid(Frames=data1[["Frames.M01"]],Obs=paste0("Observer",0:10))  
ObsData2$Observer = Obsy
ObsData2$Assessment = Obsvalue$values

ggplot(ObsData2, aes(Frames, Observer, colour=Assessment)) +
  geom_line(show_guide=T) +
  scale_y_discrete(limit=0:10, labels=Obs, expand=c(0,0))+
  scale_x_continuous(expand=c(0,0),breaks = round(seq(min(0), max(Frames), by = 200),5000))+
  expand_limits(y=c(1,.5)) +
  #The manual colorcoding actually failed, since it keeps returning this error "Continuous value supplied to discrete scale".
  scale_color_manual(breaks = c(0,1),
                 labels = c("No","Yes"),
                 values = c("green4","red"),
                 name="Assessment")

虽然现在实际上根据行为评估的价值改变了颜色,但出现了新的问题。

Observer5-10的值全部被Observer10的值取代。

通过更改几个参数,我发现通过更改行大小,值恢复正常。但是,Observer10的值完全消失了。

新脚本中的图表: http://imgur.com/AiKeXLc,kPgIKKZ#1(第二张图片是第一张图)

将这些问题与我无法手动更改颜色的事实相结合(即使我尝试在我的值上使用as.factor()as.discrete())我也不知道我现在可以尝试什么。

我可能在这里错过了一些明显的东西,作为R的初学者。

更新

dput(head(ObsData2))

的输出
## structure(list(Frames = 1:6, Obs = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Observer0", "Observer1", "Observer2", "Observer3", 
## "Observer4", "Observer5", "Observer6", "Observer7", "Observer8", 
## "Observer9", "Observer10"), class = "factor"), Observer = c(0L, 
## 0L, 0L, 0L, 0L, 0L), Assessment = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("Frames", 
## "Obs", "Observer", "Assessment"), out.attrs = structure(list(
##     dim = structure(c(4528, 11), .Names = c("Frames", "Obs")), 
##     dimnames = structure(list(Frames = c("Frames=   1", "Frames=   2", 
##     "Frames=   3", "Frames=   4", "Frames=   5", "Frames=   6", 
##     "Frames=   7", "Frames=   8", "Frames=   9", "Frames=  10", 
##     "Frames=  11", "Frames=  12", "Frames=  13", "Frames=  14", 
##     "Frames=  15", "Frames=  16", "Frames=  17", "Frames=  18", 
##     "Frames=  19", "Frames=  20", "Frames=  21", "Frames=  22", 
##     "Frames=  23", "Frames=  24", "Frames=  25", "Frames=  26", 
##     "Frames=  27", "Frames=  28", "Frames=  29", "Frames=  30", 
##     "Frames=  31", "Frames=  32", "Frames=  33", "Frames=  34", 
##     "Frames=  35", "Frames=  36", "Frames=  37", "Frames=  38", 
##     "Frames=  39", "Frames=  40", "Frames=  41", "Frames=  42", 
##     "Frames=  43", "Frames=  44", "Frames=  45", "Frames=  46", 
##     "Frames=  47", "Frames=  48", "Frames=  49", "Frames=  50", 
##     "Frames=  51", "Frames=  52", "Frames=  53", "Frames=  54", 
##     "Frames=  55", "Frames=  56", "Frames=  57", "Frames=  58", 
##     "Frames=  59", "Frames=  60", "Frames=  61", "Frames=  62", 
##     "Frames=  63", "Frames=  64", "Frames=  65", "Frames=  66", 
##     "Frames=  67", "Frames=  68", "Frames=  69", "Frames=  70", 
##     "Frames=  71", "Frames=  72", "Frames=  73", "Frames=  74", 
# Long patch of "Frames= <75-4502>"  omitted due to space saving 
##     "Frames=4503", "Frames=4504", "Frames=4505", "Frames=4506", 
##     "Frames=4507", "Frames=4508", "Frames=4509", "Frames=4510", 
##     "Frames=4511", "Frames=4512", "Frames=4513", "Frames=4514", 
##     "Frames=4515", "Frames=4516", "Frames=4517", "Frames=4518", 
##     "Frames=4519", "Frames=4520", "Frames=4521", "Frames=4522", 
##     "Frames=4523", "Frames=4524", "Frames=4525", "Frames=4526", 
##     "Frames=4527", "Frames=4528"), Obs = c("Obs=Observer0", "Obs=Observer1", 
##     "Obs=Observer2", "Obs=Observer3", "Obs=Observer4", "Obs=Observer5", 
##     "Obs=Observer6", "Obs=Observer7", "Obs=Observer8", "Obs=Observer9", 
##     "Obs=Observer10")), .Names = c("Frames", "Obs"))), .Names = c("dim", 
## "dimnames")), row.names = c(NA, 6L), class = "data.frame")

2 个答案:

答案 0 :(得分:3)

如果您将数据设置为“长”格式,这将更容易。这是假数据的一个例子:

## Create fake data in long format
ObsData = expand.grid(Frames=1:4258, Obs=paste0("Observer",0:10))

# Add y values
set.seed(10)
ObsData$y = cumsum(rnorm(4258*11))

在长格式数据框中,所有观察者都“堆叠”成一个具有11个类别的因子变量(Obs) - 每个观察者一个。现在,您可以将其用作ggplot中的颜色美学的分组变量。

## Plot with a different color for each observer
ggplot(ObsData, aes(Frames, y, colour=Obs)) +
         geom_line()

以下是使用默认颜色的图形,但您可以通过将scale_colour_manual()添加到绘图中并设置您喜欢的颜色来更改它。

enter image description here

答案 1 :(得分:0)

绕过了问题

在我的同事的帮助下,使用geom_tile()代替geom_line(),图表现在完全符合我的要求。

require("ggplot2")

data1<-read.csv("ObserversBehaviour.csv", ",", header=T)

Frames<-data1[["Frames.M01"]]
Obs.lab<-paste0("Observer",0:10)
Obsy <- sort(rep(1:11,4528),decreasing=F)
Obsvalue <- stack(data1[,c(Obs.lab)])

ObsData2 <- expand.grid(Frames=data1[["Frames.M01"]],Obs.lab=paste0("Observer",0:10))  
ObsData2$Observer = Obsy
ObsData2$Assessment = Obsvalue$values

GraphObserve <- ggplot(ObsData2, aes(Frames, Observer, height=.9)) +
  geom_tile(aes(fill = factor(Assessment)))+
  scale_fill_manual(values=c("0"="green4", "1"="red"), labels= c("No", "Yes"))+
  scale_y_discrete(expand=c(0,0), limit=1:11, labels=Obs.lab)+
  scale_x_continuous(expand=c(0,0), breaks = round(seq(min(0), max(Frames), by = 200),5000))
  update_labels(GraphObserve,list(x="Frames (M01)",y ="Observers"))

颜色中断恰好发生在他们需要的位置,没有重叠,所有观察者都列在图中。

虽然这实际上并没有解决我以前脚本中出现的问题,但它确实提供了更好的结果。

最终图表: http://i.imgur.com/pW8Qh0I.png

感谢eipi10,告诉我如何压缩我的脚本。