我有一个包含以下变量的数据框:区域,季节,年份,高度和响应(这里是一个示例):
region season year altitud response
IT wint 2013 800 45
IT wint 2013 815 47
IT wint 2013 840 54
IT wint 2014 800 49
IT wint 2014 815 59
,依此类推。共有三个区域,四个季节和两年,我想对高度和响应之间进行一些线性建模和绘制,并根据所有可能的组合对数据进行分组。即
subset(region&season&year) and get altitud~response
IT&wint&2013
IT&wint&2014
IT&spring&2013
IT&spring&2014
,依此类推。因此,有24种组合。有什么想法吗?
非常感谢您
大卫
答案 0 :(得分:1)
我的解决方案将broom
与tidy
函数一起使用。
读取数据:
library(readr)
data <- read_table("region season year altitud response
IT wint 2013 800 45
IT wint 2013 815 47
IT wint 2013 840 54
IT wint 2014 800 49
IT wint 2014 815 59")
实际解决方案:
library(dplyr)
library(broom)
data_fit <- data %>%
group_by(region, season, year) %>%
do(fit = lm(altitud ~ response, data = .))
dfCoefs <- tidy(data_fit, fit)
dfCoefs
为示例数据给出以下回归系数:
# A tibble: 4 x 8
# Groups: region, season, year [2]
region season year term estimate std.error statistic p.value
<chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 IT wint 2013 (Intercept) 613. 34.7 17.7 0.0360
2 IT wint 2013 response 4.22 0.711 5.93 0.106
3 IT wint 2014 (Intercept) 726. NaN NaN NaN
4 IT wint 2014 response 1.5 NaN NaN NaN
但是,您想要altitud ~ response
(即根据响应预测高度)还是response ~ altitud
(根据给定高度预测响应?)
答案 1 :(得分:0)
希望我对你没错,这是一个错误的解决方案:
library(purrr)
library(dplyr)
nested<-df %>%
mutate_if(is.character,as.factor) %>%
group_by(year,season,region) %>%
nest()
my_model<-function(df){
lm(altitud~response,data=df)
}
nested %>%
mutate(Mod=map(data,my_model))
结果:部分修改了数据以获得因子。
A tibble: 3 x 5
year season region data Mod
<int> <fct> <fct> <list> <list>
1 2013 wint IT <tibble [3 x 2]> <S3: lm>
2 2014 wint IT <tibble [1 x 2]> <S3: lm>
3 2014 Summer IF <tibble [1 x 2]> <S3: lm>
使用modelr
进行预测。如其他答案所示,您可以使用broom
获取统计信息。
require(modelr)
nested %>%
mutate(Mod=map(data,my_model)) %>%
mutate(Preds=map2(data,Mod,add_predictions)) %>%
unnest(Preds)
# A tibble: 5 x 6
year season region altitud response pred
<int> <fct> <fct> <int> <int> <dbl>
1 2013 wint IT 800 45 44.4
2 2013 wint IT 815 47 47.9
3 2013 wint IT 840 54 53.7
4 2014 wint IT 800 49 49
5 2014 Summer IF 815 59 59
使用broom
和purrr
获取摘要统计信息:
# A tibble: 4 x 8
year season region term estimate std.error statistic p.value
<int> <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 2013 wint IT (Intercept) -140. 31.8 -4.40 0.142
2 2013 wint IT altitud 0.231 0.0389 5.93 0.106
3 2014 wint IT (Intercept) 49 NaN NaN NaN
4 2014 Summer IF (Intercept) 59 NaN NaN NaN
nested %>%
mutate(Mod=map(data,my_model)) %>%
mutate(Preds=map2(data,Mod,add_predictions),Tidy=map(Mod,tidy)) %>%
unnest(Tidy)
数据:
df<-read.table(text="region season year altitud response
IT wint 2013 800 45
IT wint 2013 815 47
IT wint 2013 840 54
IT wint 2014 800 49
IF Summer 2014 815 59",header=T)
答案 2 :(得分:0)
为了完整起见,这也是基R和data.table解决方案。
使用 public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
webBrowser1.Navigate("http://www.qyxproject.club/wp-login.php");
}
public class CookieAwareWebClient : WebClient
{
public void Login(string loginPageAddress, NameValueCollection loginData)
{
CookieContainer container;
var request = (HttpWebRequest)WebRequest.Create(loginPageAddress);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
var query = string.Join("&",
loginData.Cast<string>().Select(key => $"{key}={loginData[key]}"));
var buffer = Encoding.ASCII.GetBytes(query);
request.ContentLength = buffer.Length;
var requestStream = request.GetRequestStream();
requestStream.Write(buffer, 0, buffer.Length);
requestStream.Close();
container = request.CookieContainer = new CookieContainer();
var response = request.GetResponse();
response.Close();
CookieContainer = container;
}
public CookieAwareWebClient(CookieContainer container)
{
CookieContainer = container;
}
public CookieAwareWebClient()
: this(new CookieContainer())
{
}
public CookieContainer CookieContainer { get; private set; }
protected override WebRequest GetWebRequest(Uri address)
{
var request = (HttpWebRequest)base.GetWebRequest(address);
request.CookieContainer = CookieContainer;
return request;
}
}
private void btn_login_Click(object sender, EventArgs e)
{
var inputElements = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement i in inputElements)
{
if (i.GetAttribute("name").Equals("log"))
{
i.InnerText = Username.Text;
}
if (i.GetAttribute("name").Equals("pwd"))
{
i.Focus();
i.InnerText = Password.Text;
}
}
var buttonElements = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement b in buttonElements)
{
if (b.GetAttribute("className").Equals("button button-primary button-large"))
{
b.InvokeMember("click");
}
}
}
}
}
和split()
的一种基本R方法是suggested by Jogo:
lapply()
result <- lapply(split(DT, list(DT$region, DT$season, DT$year)), lm, formula = response ~ altitud) print(result)
或者,使用管道提高可读性
$IT.wint.2013
Call:
FUN(formula = ..1, data = X[[i]])
Coefficients:
(Intercept) altitud
-140.0510 0.2306
$IT.wint.2014
Call:
FUN(formula = ..1, data = X[[i]])
Coefficients:
(Intercept) altitud
-484.3333 0.6667
在library(magrittr)
result <- split(DT, list(DT$region, DT$season, DT$year)) %>%
lapply(lm, formula = response ~ altitud)
的帮助下:
broom
library(data.table) library(magrittr) setDT(DT)[, lm(response ~ altitud, .SD) %>% broom::tidy(), by = .(region, season, year)]
region season year term estimate std.error statistic p.value
1: IT wint 2013 (Intercept) -140.0510204 31.82553603 -4.400586 0.1422513
2: IT wint 2013 altitud 0.2306122 0.03888277 5.930962 0.1063382
3: IT wint 2014 (Intercept) -484.3333333 NaN NaN NaN
4: IT wint 2014 altitud 0.6666667 NaN NaN NaN
setDT(DT)[, lm(response ~ altitud, .SD) %>% broom::glance(), by = .(region, season, year)]
如果为不同的组计算 region season year r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
1: IT wint 2013 0.9723576 0.9447152 1.111168 35.17631 0.1063382 2 -2.925132 11.85026 9.1461 1.234694 1
2: IT wint 2014 1.0000000 NaN NaN NaN NaN 2 Inf -Inf -Inf 0.000000 0
是费时的,则可能值得存储结果并将其用于后续处理步骤:
lm()
mod <- setDT(DT)[, .(model = .(lm(response ~ altitud, .SD))), by = .(region, season, year)] mod
region season year models
1: IT wint 2013 <lm>
2: IT wint 2014 <lm>
是等效于mod$models
的模型的列表。
现在,我们可以从计算的模型中提取所需的信息,例如
result
mod[, models[[1]] %>% broom::tidy(), by = .(region, season, year)]
region season year term estimate std.error statistic p.value
1: IT wint 2013 (Intercept) -140.0510204 31.82553603 -4.400586 0.1422513
2: IT wint 2013 altitud 0.2306122 0.03888277 5.930962 0.1063382
3: IT wint 2014 (Intercept) -484.3333333 NaN NaN NaN
4: IT wint 2014 altitud 0.6666667 NaN NaN NaN