如何在R中将几个变量合并为一个变量

时间:2018-12-13 16:17:47

标签: r

所以我有一个华盛顿特区自行车租赁数据集。 我的一些变量是因子,有些是数字变量和连续变量。 我找不到上载数据集的方法,因此我希望下一个解释就足够了: 我想解释一下气候条件下自行车租赁的“计数”(数字和连续数)。 我想将以下变量合并为一个称为agg_climate的变量:

- season(factor) - 1 = Winter, 2 = Summer, 3 = Spring, 4 = Fall
 - weather(factor) - 1 = Good, 2 = Normal, 3 = Bad
 - temp(continuous) - measured in degrees
 - atemp(continuous) - measured in degrees
 - windspeed(continuous) - measured in mp/h
 - humidity(continuous) - measured in %

    datetime season     holiday  workingday weather  temp  atemp humidity windspeed count hour
3201 2011-09-15 17:00:00 Summer Regular day Working day     Bad 19.68 23.485       82   31.0009   261   17
377  2011-02-02 05:00:00 Winter Regular day Working day     Bad  9.02 12.120       93    7.0015     3    5
6103 2012-06-01 21:00:00 Spring Regular day Working day     Bad 26.24 29.545       78   16.9979    85   21
           daytime
3201    After Noon
377  Early Morning
6103       Evening
数据表的图片: https://ibb.co/SnphvBt

这样做的正确方法是什么? 谢谢!

1 个答案:

答案 0 :(得分:0)

您可以将多个与天气相关的度量合并为一个名为apparent temperature的度量。

  

AT索引...是基于成年人在户外遮荫下行走的数学模型(Steadman 1994)。 AT定义为:在参考湿度下,温度会产生与当前环境温度和湿度下相同的不适感。

请在下面查看如何根据您的情况实施它:

x <- structure(list(datetime = structure(c(2L, 1L, 3L), .Label = c("05:00:00", 
"17:00:00", "21:00:00"), class = "factor"), season = structure(c(2L, 
3L, 1L), .Label = c("Spring", "Summer", "Winter"), class = "factor"), 
    holiday = c("Regular day", "Regular day", "Regular day"), 
    workingday = c("Working day", "Working day", "Working day"
    ), weather = structure(c(1L, 3L, 2L), .Label = c("Bad", "Good", 
    "Normal"), class = "factor"), temp = c(19.68, 9.02, 26.24
    ), atemp = c(23.485, 12.12, 29.545), humidity = c(82L, 93L, 
    78L), windspeed = c(31.0009, 7.0015, 16.9979), count = c(261L, 
    3L, 85L), hour = c(17L, 5L, 21L), daytime = c("After Noon", 
    "Early Morning", "Evening")), row.names = c("2011-09-15", 
"2011-02-02", "2012-06-01"), class = "data.frame")

x$e <- x$humidity / 100 * 6.105 * exp(17.27 * x$temp / (237.7 + x$temp)) # vapor pressure
x$windspeed_ms <- 0.4470400 * x$windspeed # windspeed in m/s
x$AT <- x$temp + 0.33 * x$e - 0.7 * x$windspeed_ms - 4.00 # apparent temperature
x[, c("temp",  "humidity", "windspeed", "AT")]

输出

            temp humidity windspeed        AT
2011-09-15 19.68       82   31.0009 12.166304
2011-02-02  9.02       93    7.0015  6.351849
2012-06-01 26.24       78   16.9979 25.669603

对于其他变量,它们与季节有关,最好使用:

  • 具有外生变量的时间序列分析;
  • 机器学习(例如随机森林回归,递归神经网络等);
  • 多元(非线性)线性回归。