在R中,融化的数据与ggplot一起使用。为什么相同的手动构造数据集会失败?

时间:2018-02-14 17:36:36

标签: r ggplot2 melt

我有一个看起来像这样的数据框,现在就叫它t1:

       D1                D3                D5        
 Min.   :-0.2692   Min.   :-0.4129   Min.   : 2.509  
 1st Qu.: 2.4232   1st Qu.: 2.9288   1st Qu.: 4.731  
 Median : 3.3372   Median : 4.0337   Median : 5.657  
 Mean   : 3.5321   Mean   : 4.1214   Mean   : 5.943  
 3rd Qu.: 4.4551   3rd Qu.: 5.0950   3rd Qu.: 6.935  
 Max.   : 9.2710   Max.   : 9.5757   Max.   :10.604 

我可以融化该数据帧,它看起来像这样:

   variable    value
1        D1 5.121777
2        D1 7.129591
3        D1 6.568010
4        D1 9.271042
5        D1 6.246738
...      ...   
909      D5 6.323069
910      D5 6.397816
911      D5 6.293596
912      D5 5.167107
913      D5 4.118420
914      D5 5.733515
...      ....

我在基于某个组的融化数据中添加了第三列,所以最后一列看起来像这样。

   variable    value   groupBy
1        D1 5.121777  group1
2        D1 7.129591  group1
3        D1 6.568010  group1
4        D1 9.271042  group1
5        D1 6.246738  group2
...      ...   
909      D5 6.323069  group4
910      D5 6.397816  group4
911      D5 6.293596  group4
912      D5 5.167107  group5
913      D5 4.118420  group5
914      D5 5.733515  group5
...      ....

我的目标是绘制X轴所具有的东西,D1,D5等。此数据框中的“变量”和Y轴使用该值,颜色按组分割。这实际上很好。

ggplot(final_melt, aes(x = as.numeric(variable), y = value, colour = groupVar)) + geom_smooth(aes(x = as.numeric(variable), y = value), method = 'glm')

This is what the image looks like using the current method after some styling.

现在,我想对此做一个变化,所以我创建了自己的融化数据版本来代替。

  #This is in a loop and just creates "pseudo-melted" data.
  nameSet  <- colnames(result_dfs[[i]])
  meanSet  <- as.numeric(lapply(result_dfs[[i]], mean))
  groupVar <- rep((paste("group", i, sep="")), length(nameSet))
  cBound   <- cbind(nameSet,as.numeric(meanSet),groupVar)
  mean_dat <- rbind(mean_dat, cBound)

  #After the loop, make everything look just like the standard melted dataset.
  colnames(mean_dat) <- c("variable","value","groupVar")
  mean_dat <- data.frame(mean_dat)

因此,手动构建的伪造融化数据看起来像这样。我只是希望x轴具有“变量”类别和基于值从条件到条件的线,其中groupVar将各个线着色。

   variable              value groupVar
1  Ebola_D1   2.08831695477086   group1
2  Ebola_D3   2.54949105549377   group1
3  Ebola_D5   4.15035141230915   group1
4  Ebola_D1 -0.390323691887409   group2
5  Ebola_D3  -1.83541896004176   group2
6  Ebola_D5  -1.12565386663147   group2
7  Ebola_D1  -0.83608582623162   group3
8  Ebola_D3  -7.55858863601214   group3
9  Ebola_D5  -2.52864397283096   group3
10 Ebola_D1  0.457247980555584   group4
11 Ebola_D3  0.957424853791735   group4
12 Ebola_D5   1.17865891001209   group4

首先,让我们尝试完全相同的事情:

> ggplot(series_dat, aes(x = as.numeric(variable), y = value, colour 
= groupVar)) + geom_smooth(aes(x = as.numeric(variable), y = value), 
method = 'glm')
    Don't know how to automatically pick scale for object of type 
        list. Defaulting to continuous.
    Don't know how to automatically pick scale for object of type 
        list. Defaulting to continuous.
    Error: stat_smooth requires the following missing aesthetics: y
    In addition: There were 24 warnings (use warnings() to see them)

> warnings()
Warning messages:
1: In fun(x, ...) : NAs introduced by coercion
  .. . . . . 

好吧,这样做不行,所以我试着让它更简单,只是一个线条图。

> ggplot(series_dat, aes(x=variable, y=value, group = groupVar)) + 
geom_line(color ="blue")
  Don't know how to automatically pick scale for object of type list. 
      Defaulting to continuous.
  Don't know how to automatically pick scale for object of type list. 
      Defaulting to continuous.
  Don't know how to automatically pick scale for object of type list. 
      Defaulting to continuous.
  Error in order(data$PANEL, data$group, data$x) : 
    argument 3 is not a vector

所以我尝试了许多变化,但我无法弄清楚为什么这个手动创建的数据不会像融化的数据那样起作用。我觉得类型问题,但我检查了两者的类型,一切看起来都一样。我感谢任何人都能提供的任何见解。谢谢!

@joran提到检查str(),这里是。

这是为了融化的:

'data.frame':   918 obs. of  2 variables:
 $ variable: Factor w/ 3 levels "D1","D3","D5": 1 1 1 1 1 1 1 1 1 1 ...
 $ value   : num  5.12 7.13 6.57 9.27 6.25 ...

这是非融化的。

'data.frame':   12 obs. of  3 variables:
$ variable:List of 12
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
$ value   :List of 12
 ..$ : chr "2.08831695477086"
 ..$ : chr "2.54949105549377"
 ..$ : chr "4.15035141230915"
 ..$ : chr "-0.390323691887409"
 ..$ : chr "-1.83541896004176"
 ..$ : chr "-1.12565386663147"
 ..$ : chr "-0.83608582623162"
 ..$ : chr "-7.55858863601214"
 ..$ : chr "-2.52864397283096"
 ..$ : chr "0.457247980555584"
 ..$ : chr "0.957424853791735"
 ..$ : chr "1.17865891001209"
$ groupVar:List of 12
 ..$ : chr "group1"
 ..$ : chr "group1"
 ..$ : chr "group1"
 ..$ : chr "group2"
 ..$ : chr "group2"
 ..$ : chr "group2"
 ..$ : chr "group3"
 ..$ : chr "group3"
 ..$ : chr "group3"
 ..$ : chr "group4"
 ..$ : chr "group4"
 ..$ : chr "group4"

所以这很有帮助,但我仍然不太清楚如何处理这个问题。

1 个答案:

答案 0 :(得分:1)

如果您希望/希望结果是数据框,请小心使用cbind。除非在非常具体的情况下,cbind()将倾向于生成矩阵,因此会将所有内容转换为单一类型。

从单个向量创建数据框的最安全方法是使用data.frame()