Question

我有一个包含以下变量的数据框：区域，季节，年份，高度和响应（这里是一个示例）：

region   season   year   altitud   response
IT       wint     2013   800       45
IT       wint     2013   815       47
IT       wint     2013   840       54
IT       wint     2014   800       49
IT       wint     2014   815       59

，依此类推。共有三个区域，四个季节和两年，我想对高度和响应之间进行一些线性建模和绘制，并根据所有可能的组合对数据进行分组。即

subset(region&season&year)   and get  altitud~response
IT&wint&2013
IT&wint&2014
IT&spring&2013
IT&spring&2014

，依此类推。因此，有24种组合。有什么想法吗？

非常感谢您

大卫

Answer 1

我的解决方案将broom与tidy函数一起使用。

读取数据：

library(readr)

data <- read_table("region   season   year   altitud   response
IT       wint     2013   800       45
IT       wint     2013   815       47
IT       wint     2013   840       54
IT       wint     2014   800       49
IT       wint     2014   815       59")

实际解决方案：

library(dplyr)
library(broom)
data_fit <- data %>%
    group_by(region, season, year) %>%
    do(fit = lm(altitud ~ response, data = .))

dfCoefs <- tidy(data_fit, fit)
dfCoefs

为示例数据给出以下回归系数：

# A tibble: 4 x 8
# Groups:   region, season, year [2]
  region season  year term        estimate std.error statistic  p.value
  <chr>  <chr>  <dbl> <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 IT     wint    2013 (Intercept)   613.      34.7       17.7    0.0360
2 IT     wint    2013 response        4.22     0.711      5.93   0.106 
3 IT     wint    2014 (Intercept)   726.     NaN        NaN    NaN     
4 IT     wint    2014 response        1.5    NaN        NaN    NaN

但是，您想要altitud ~ response（即根据响应预测高度）还是response ~ altitud（根据给定高度预测响应？）

Answer 2

希望我对你没错，这是一个错误的解决方案：

library(purrr)
library(dplyr)
nested<-df %>% 
  mutate_if(is.character,as.factor) %>% 
  group_by(year,season,region) %>% 
  nest()
my_model<-function(df){
  lm(altitud~response,data=df)
}

nested %>% 
  mutate(Mod=map(data,my_model))

结果：部分修改了数据以获得因子。

 A tibble: 3 x 5
   year season region data             Mod     
  <int> <fct>  <fct>  <list>           <list>  
1  2013 wint   IT     <tibble [3 x 2]> <S3: lm>
2  2014 wint   IT     <tibble [1 x 2]> <S3: lm>
3  2014 Summer IF     <tibble [1 x 2]> <S3: lm>

使用modelr进行预测。如其他答案所示，您可以使用broom获取统计信息。

require(modelr)
nested %>% 
  mutate(Mod=map(data,my_model)) %>% 
  mutate(Preds=map2(data,Mod,add_predictions)) %>% 
  unnest(Preds)
# A tibble: 5 x 6
   year season region altitud response  pred
  <int> <fct>  <fct>    <int>    <int> <dbl>
1  2013 wint   IT         800       45  44.4
2  2013 wint   IT         815       47  47.9
3  2013 wint   IT         840       54  53.7
4  2014 wint   IT         800       49  49  
5  2014 Summer IF         815       59  59

使用broom和purrr获取摘要统计信息：

# A tibble: 4 x 8
   year season region term        estimate std.error statistic p.value
  <int> <fct>  <fct>  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1  2013 wint   IT     (Intercept) -140.      31.8        -4.40   0.142
2  2013 wint   IT     altitud        0.231    0.0389      5.93   0.106
3  2014 wint   IT     (Intercept)   49      NaN         NaN    NaN    
4  2014 Summer IF     (Intercept)   59      NaN         NaN    NaN

nested %>% 
  mutate(Mod=map(data,my_model)) %>% 
  mutate(Preds=map2(data,Mod,add_predictions),Tidy=map(Mod,tidy)) %>% 
  unnest(Tidy)

数据：

df<-read.table(text="region   season   year   altitud   response
IT       wint     2013   800       45
               IT       wint     2013   815       47
               IT       wint     2013   840       54
               IT       wint     2014   800       49
               IF       Summer     2014   815       59",header=T)

Answer 3

为了完整起见，这也是基R和data.table解决方案。

基本R

使用public partial class Form1 : Form { public Form1() { InitializeComponent(); webBrowser1.Navigate("http://www.qyxproject.club/wp-login.php"); } public class CookieAwareWebClient : WebClient { public void Login(string loginPageAddress, NameValueCollection loginData) { CookieContainer container; var request = (HttpWebRequest)WebRequest.Create(loginPageAddress); request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; var query = string.Join("&", loginData.Cast<string>().Select(key => $"{key}={loginData[key]}")); var buffer = Encoding.ASCII.GetBytes(query); request.ContentLength = buffer.Length; var requestStream = request.GetRequestStream(); requestStream.Write(buffer, 0, buffer.Length); requestStream.Close(); container = request.CookieContainer = new CookieContainer(); var response = request.GetResponse(); response.Close(); CookieContainer = container; } public CookieAwareWebClient(CookieContainer container) { CookieContainer = container; } public CookieAwareWebClient() : this(new CookieContainer()) { } public CookieContainer CookieContainer { get; private set; } protected override WebRequest GetWebRequest(Uri address) { var request = (HttpWebRequest)base.GetWebRequest(address); request.CookieContainer = CookieContainer; return request; } } private void btn_login_Click(object sender, EventArgs e) { var inputElements = webBrowser1.Document.GetElementsByTagName("input"); foreach (HtmlElement i in inputElements) { if (i.GetAttribute("name").Equals("log")) { i.InnerText = Username.Text; } if (i.GetAttribute("name").Equals("pwd")) { i.Focus(); i.InnerText = Password.Text; } } var buttonElements = webBrowser1.Document.GetElementsByTagName("input"); foreach (HtmlElement b in buttonElements) { if (b.GetAttribute("className").Equals("button button-primary button-large")) { b.InvokeMember("click"); } } } } }和split()的一种基本R方法是suggested by Jogo：

lapply()

result <- lapply(split(DT, list(DT$region, DT$season, DT$year)), 
                 lm, formula = response ~ altitud)
print(result)

或者，使用管道提高可读性

$IT.wint.2013

Call:
FUN(formula = ..1, data = X[[i]])

Coefficients:
(Intercept)      altitud  
  -140.0510       0.2306  


$IT.wint.2014

Call:
FUN(formula = ..1, data = X[[i]])

Coefficients:
(Intercept)      altitud  
  -484.3333       0.6667

data.table

在library(magrittr) result <- split(DT, list(DT$region, DT$season, DT$year)) %>% lapply(lm, formula = response ~ altitud)的帮助下：

broom

library(data.table)
library(magrittr)
setDT(DT)[, lm(response ~ altitud, .SD) %>% broom::tidy(), by = .(region, season, year)]

   region season year        term     estimate   std.error statistic   p.value
1:     IT   wint 2013 (Intercept) -140.0510204 31.82553603 -4.400586 0.1422513
2:     IT   wint 2013     altitud    0.2306122  0.03888277  5.930962 0.1063382
3:     IT   wint 2014 (Intercept) -484.3333333         NaN       NaN       NaN
4:     IT   wint 2014     altitud    0.6666667         NaN       NaN       NaN

setDT(DT)[, lm(response ~ altitud, .SD) %>% broom::glance(), by = .(region, season, year)]

如果为不同的组计算region season year r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual 1: IT wint 2013 0.9723576 0.9447152 1.111168 35.17631 0.1063382 2 -2.925132 11.85026 9.1461 1.234694 1 2: IT wint 2014 1.0000000 NaN NaN NaN NaN 2 Inf -Inf -Inf 0.000000 0是费时的，则可能值得存储结果并将其用于后续处理步骤：

lm()

mod <- setDT(DT)[, .(model = .(lm(response ~ altitud, .SD))), by = .(region, season, year)]
mod

region season year models 1: IT wint 2013 <lm> 2: IT wint 2014 <lm>是等效于mod$models的模型的列表。

现在，我们可以从计算的模型中提取所需的信息，例如

result

mod[, models[[1]] %>% broom::tidy(), by = .(region, season, year)]

数据

   region season year        term     estimate   std.error statistic   p.value
1:     IT   wint 2013 (Intercept) -140.0510204 31.82553603 -4.400586 0.1422513
2:     IT   wint 2013     altitud    0.2306122  0.03888277  5.930962 0.1063382
3:     IT   wint 2014 (Intercept) -484.3333333         NaN       NaN       NaN
4:     IT   wint 2014     altitud    0.6666667         NaN       NaN       NaN

在R

3 个答案:

基本R

data.table

数据