我正在使用 R。
这是一个示例数据。
structure(list(conditions = c("secondCondition", "firstCondition", "firstCondition",
"secondCondition", "secondCondition", "firstCondition", "firstCondition",
"secondCondition", "firstCondition", "firstCondition", "firstCondition",
"secondCondition", "firstCondition", "firstCondition", "firstCondition",
"secondCondition", "firstCondition", "firstCondition", "firstCondition",
"firstCondition", "firstCondition", "firstCondition", "secondCondition",
"firstCondition", "firstCondition", "firstCondition", "secondCondition",
"firstCondition", "firstCondition", "firstCondition", "secondCondition",
"firstCondition", "firstCondition", "firstCondition", "secondCondition",
"firstCondition", "secondCondition", "firstCondition", "secondCondition",
"firstCondition", "firstCondition", "firstCondition", "secondCondition",
"secondCondition", "firstCondition", "firstCondition", "secondCondition",
"firstCondition", "firstCondition", "firstCondition"), WordsProduced = c("parking",
"ball", "mobile", "dad", "agressive", "triple", "face",
"donate", "serve", "happy", "hello", "cry", "distinct",
"tribe", "confuse", "island", "hawai", "color", "smile",
"walk", "good", "beach", "affect", "skin", "place",
"run", "vigilant", "eager", "mountain", "gay", "fear",
"love", "hate", "star", "sun", "doge", "moon",
"bitcoin", "plantair", "tesla", "final", "fresh", "friend",
"solitude", "life", "sadness", "sky", "terror", "shy",
"table"), MeanWordsProduced = c(0.110952380952381, 2.94285714285714,
0.110952380952381, 2.94285714285714, 0.110952380952381, 2.94285714285714,
0.110952380952381, 2.94285714285714, 2.94285714285714, 2.94285714285714,
0.110952380952381, 2.94285714285714, 2.94285714285714, 2.94285714285714,
0.110952380952381, 2.94285714285714, 2.94285714285714, 2.94285714285714,
2.94285714285714, 2.94285714285714, 2.94285714285714, 2.94285714285714,
2.94285714285714, 2.94285714285714, 2.94285714285714, 0.110952380952381,
2.94285714285714, 2.94285714285714, 2.94285714285714, 0.110952380952381,
2.94285714285714, 2.94285714285714, 2.94285714285714, 0.110952380952381,
2.94285714285714, 2.94285714285714, 0.110952380952381, 0.110952380952381,
2.94285714285714, 2.94285714285714, 2.94285714285714, 0.110952380952381,
2.94285714285714, 0.110952380952381, 2.94285714285714, 0.110952380952381,
2.94285714285714, 2.94285714285714, 2.94285714285714, 0.110952380952381
)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"
))
显然,每个条件产生的均值词是不正确的,但这是因为我使用的数据要大得多。
所以,我有一个问题。我需要使用 t 检验比较两组(firstCondition 和 secondCondition)。我已经使用其他一些列中有数字的值/列完成了此操作。但现在我需要比较两组产生的单词数。
两个条件共有 300 个提示,但产生的单词总数因条件而异。例如,我可以在第一个条件下总共生成 882 个单词。
条件列中条件名称重复的总次数与产生的单词总数一致,而不是实际提示的总数。
我不知道是否需要所有这些额外信息,但我的问题是如何使用一列单词而不是数字来计算 t 值。
我一直使用的公式是...
t_test(COLUMofInterest ~ conditions, mu = 0, alternative = "two.sided", conf.level = 0.95, var.equal = FALSE, paired = FALSE) %>%
add_significance()
任何帮助或建议都会很棒。谢谢