我正在处理格式不正确的数据集,并试图将其整理成整齐的格式以进行统计测试和数据可视化。我希望有人能提供一些见解,以了解我是否拥有正确整齐的数据以及执行多个t.test的最简单方法。
以下是一些示例数据,类似于我的非常规格式:
worker_processes 1;
user root root;
pid /var/run/nginx.pid;
error_log /var/log/nginx.error.log;
events {
worker_connections 1024;
accept_mutex off;
use epoll;
}
http {
include mime.types;
default_type application/octet-stream;
access_log /var/log/nginx.access.log combined;
sendfile on;
tcp_nopush on;
tcp_nodelay off;
gzip on;
gzip_http_version 1.0;
gzip_proxied any;
gzip_min_length 500;
gzip_disable "MSIE [1-6]\.";
gzip_types text/plain text/html text/xml text/css
text/comma-separated-values
text/javascript application/x-javascript
application/atom+xml;
upstream app_server {
server unix:/root/Ratwires/shared/sockets/unicorn.sock fail_timeout=0;
}
server {
client_max_body_size 4G;
server_name *.ratwires.space;
keepalive_timeout 5;
root /root/Ratwires/public;
try_files $uri/index.html $uri.html $uri @app;
location @app {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://localhost:8080;
}
error_page 500 502 503 504 /500.html;
location = /500.html {
root /root/Ratwires/public;
}
listen 443 ssl;
ssl_certificate /etc/letsencrypt/live/ratwires.space-0002/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ratwires.space-0002/privkey.pem;
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
server {
if ($host = ratwires.space) {
return 301 http://$host$request_uri;
}
server_name *.ratwires.space;
return 404; # managed by Certbot
listen 443 ssl;
ssl_certificate /etc/letsencrypt/live/ratwires.space-0002/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ratwires.space-0002/privkey.pem;
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
}
server {
if ($host ~ ^[^.]+\.ratwires\.space$) {
return 301 https://$host$request_uri;
} # managed by Certbot
listen 80 default deferred;
server_name *.ratwires.space;
return 404;
}}
该小标题包含人口统计信息,然后包含三种度量,这些度量是在不同条件下收集的,其中一些度量由两名检查员执行(例如x_c1_avg是在条件1(某个腿的位置)上收集的度量x的平均值,而y_c2e1_avg是检验员1在条件2下收集的y的平均值。
所以我的第一个问题是,我是否纠正下面的代码输出被认为是整洁的?度量,条件和检查者分别在自己的列中,而值在另一列中。 >
library(tidyverse)
data <- data.frame("subject_id" = 1:10, "age" = 21:30, "weight" = 150:159, "height" = 65:74,
"x_c1_avg" = c(1:9, NA), "y_c1_avg" = runif(10),"z_c1_avg" = c(9:1, NA),
"x_c2e1_avg" = c(1:9, NA), "y_c2e1_avg" = runif(10), "z_c2e1_avg" = runif(10),
"x_c2e2_avg" = runif(10), "y_c2e2_avg" = runif(10), "z_c2e2_avg" = runif(10))
glimpse(data)
我的第二个问题是,对此数据执行配对t.test的最有效方法是什么,或者是否有一种方法可以在不为每个变量创建新向量的情况下完成此操作?总共,但是我只会进行六次比较的t.test。我将比较每个对象的c1处的x值与c2处的x值,或条件2处的检查员1的y值与条件2处的检查员2的y值进行比较,依此类推。我当前的代码是:
data2 <- data %>%
gather(key = "condition", value = "value", -c(subject_id:height)) %>%
separate(condition, into = c("measure", "condition"), sep = "_", extra = "drop") %>%
separate(condition, into = c("condition", "examiner"), sep = 2, fill = "right")
但是,这似乎比需要的要复杂得多,并且感觉就像我正在颠倒为使它整洁地完成所做的工作。从一开始就运行起来会容易得多:
x_c1 <- data2 %>%
filter(measure == "x", condition == "c1") %>%
select(value)
x_c2_e2 <- data2 %>%
filter(measure == "x", condition == "c2", examiner == "e2") %>%
select(value)
t.test(x_c1$value, x_c2_e2$value, paired = TRUE)