如何合并具有相同列名的两个数据集

时间:2021-02-27 17:56:00

标签: r

我是 R 新手,所以要简单!我有两个数据集,其中两个不同的样本(男性和女性)被问到相同的问题(列名相同)。我想运行一个 t 检验,比较每个数据集中任意两列的均值,但我不知道如何以一种有用的方式将它们合并到一个数据集中。我尝试了一些诸如合并和 rbind 之类的方法,但它们并没有按照我的意愿行事。

这是数据集 1 中的一列。我想将其与...进行比较

structure(list(UVRATE1 = c(6, 6, 3, 7, 7, 7, 4, 6, 6, 6, 6, 4, 
7, 4, 1, 5, 6)), class = "data.frame", row.names = c(NA, -17L
))

... 数据集 2 中的这一列(如您所见,列名相同。

structure(list(UVRATE2 = c(4, 1, 3, 5, 6, 7, 7, 4, 7, 4, 7, 7, 
4, 4, 5, 1, 4)), class = "data.frame", row.names = c(NA, -17L
))

2 个答案:

答案 0 :(得分:4)

您可以创建一个数据框并使用 t.test 将其直接传递给未配对的双样本 t 检验:

dataset1 <- data.frame (UVRATE1 = c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5))
# dataset1$UVRATE1
# [1] 38.9 61.2 73.3 21.8 63.4 64.6 48.4 48.8 48.5

dataset2 <- data.frame (UVRATE1 = c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4))
# dataset2$UVRATE1
# [1] 67.8 60.0 63.4 76.0 89.4 73.3 67.3 61.3 62.4

# Create a merged data frame
my_data <- data.frame( 
  group = rep(c("Woman", "Man"), each = 9),
  weight = c(dataset1$UVRATE1,  dataset2$UVRATE1)
)

# my_data
# group weight
# 1  Woman   38.9
# 2  Woman   61.2
# 3  Woman   73.3
# 4  Woman   21.8
# 5  Woman   63.4
# 6  Woman   64.6
# 7  Woman   48.4
# 8  Woman   48.8
# 9  Woman   48.5
# 10   Man   67.8
# 11   Man   60.0
# 12   Man   63.4
# 13   Man   76.0
# 14   Man   89.4
# 15   Man   73.3
# 16   Man   67.3
# 17   Man   61.3
# 18   Man   62.4

# Compute t-test
res <- t.test(my_data[my_data$group == "Woman",]$weight,my_data[my_data$group == "Man",]$weight, var.equal = TRUE)

# Two Sample t-test
# 
# data:  my_data[my_data$group == "Woman", ]$weight and my_data[my_data$group == "Man", ]$weight
# t = -2.7842, df = 16, p-value = 0.01327
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   -29.748019  -4.029759
# sample estimates:
#   mean of x mean of y 
# 52.10000  68.98889 

不要忘记检查假设。

答案 1 :(得分:0)

代码:

# dataframe 1
dataset_1 <- data.frame(UVRATE1= c(6, 6, 3, 7, 7, 7, 4, 6, 6, 6, 6, 4, 7, 4, 1, 5, 6)) 
# dataframe 2
dataset_2 <- data.frame(UVRATE1= c(4, 1, 3, 5, 6, 7, 7, 4, 7, 4, 7, 7, 4, 4, 5, 1, 4))

# change name of column in dataset2
colnames(dataset_2)[1] = "UVRATE2"

# combine to one dataframe
df <- cbind(dataset_1, dataset_2)

# t-test
t.test(df$UVRATE1,df$UVRATE2)

输出:

    Welch Two Sample t-test

data:  df$x and df$y
t = 1.0394, df = 31.128, p-value = 0.3066
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.622388  1.916506
sample estimates:
mean of x mean of y 
 5.352941  4.705882 
相关问题