如何根据因子水平针对自身绘制变量?

时间:2018-06-20 00:43:20

标签: r ggplot2 dplyr

我有一个在不同城市不同时间测得的温度数据集:

location    time      temperature  
Guangzhou   evening   21
Tokyo       evening   39
Lima        morning   77
Shenzhen    morning   76
Lahore      noon      24
Shanghai    evening   80
Tianjin     evening   91
Delhi       morning   51
Dhaka       morning   6 
Karachi     noon      84
Sao Paulo   noon      49
Tianjin     noon      89
Beijing     evening   3 
Delhi       evening   93
Dhaka       evening   65
Istanbul    evening   37
Karachi     evening   81
Kinshasa    evening   89
Lahore      evening   2 
Lima        evening   77
Manila      evening   74
Moscow      evening   60
Mumbai      evening   41
Sao Paulo   evening   13
Seoul       evening   65
Shenzhen    evening   3 
Wuhan       evening   30
Beijing     morning   61
Guangzhou   morning   29
Karachi     morning   84
Kinshasa    morning   4 
Lahore      morning   12
Manila      morning   89
Moscow      morning   71
Mumbai      morning   7 
Sao Paulo   morning   87
Seoul       morning   74
Shanghai    morning   63
Tianjin     morning   32
Tokyo       morning   81
Wuhan       morning   21
Beijing     noon      38
Chengdu     noon      51
Delhi       noon      61
Dhaka       noon      55
Istanbul    noon      12
Kinshasa    noon      77
Lima        noon      86
Manila      noon      47
Moscow      noon      2 
Mumbai      noon      41
Seoul       noon      97
Shenzhen    noon      24
Tokyo       noon      94           

我想使用ggplot和dplyr制作散点图,其中:

  • x轴是早晨测得的温度
  • y轴是中午测量的温度
  • 每个城市都是一个点
  • 中午和早晨缺少数据的城市被排除

我该怎么做?

1 个答案:

答案 0 :(得分:0)

您需要tidyr软件包中的collect()函数。

让我们假设您的数据存储在一个所谓的对象上:数据,然后...

data %>% 
  tidyr::spread(key = time, value = temperature) %>% 
  na.omit() %>% 
  ggplot(aes(x = morning, y = noon, label = location)) + 
  geom_jitter() + 
  geom_text(nudge_y = 4)

spread():它将您的“宽”数据格式转换为“长”数据格式,为此我们需要通知“键”,即其值进入列的变量和“值”,即变量具有填充数据框的信息。

希望有帮助!

enter image description here