我是R Studio的新手。上课时,我提取了美国人口普查2016年选举数据集,并希望对该数据集进行一系列T检验。数据集上的一些细节。首先,对数据进行编码(从1到4),以表示公民身份。我想看看是否有各种因素影响投票的可能性(1 =是或2 =否)。
代码如下:
factor <- c("Age", "Fathers_country_of_birth", "Mothers_country_of_birth","Highest_level_of_School_completed", "Country_of_birth")
citizen <- c("NATIVE, BORN IN THE UNITED STATES", "NATIVE, BORN IN PUERTO RICO OR OTHER U.S. ISLAND AREAS", "NATIVE, BORN ABROAD OF AMERICAN PARENT OR PARENTS", "FOREIGN BORN, U.S. CITIZEN BY NATURALIZATION")
for (f in factor) {
print(f)
for (i in 1:4){
print(paste("Citizenship is", citizen[i] ))
query <- paste("select * from result2 where Citizenship = ",i)
sample <- sqldf(query)
print(
(t.test(f ~ Vote_in_Election, data=sample, var.equal = FALSE) ) )
} }
它会引发“可变长度”错误
> [1] "Age" [1] "Citizenship is NATIVE, BORN IN THE UNITED STATES" Show
> Traceback Error in model.frame.default(formula = f ~ Vote_in_Election,
> data = sample) : variable lengths differ (found for
> 'Vote_in_Election')
如果我取出外部循环,就可以很好地运行它,当然,我必须一一放入“ factor”中的值。
运行R Studio版本1.1.463,Windows 10上的R是3.5.2。
因为当我遍历i时会有不同的数据行,所以我尝试将paired = FALSE设置为它仍然对我大吼大叫。
我已经仔细检查过,但没有找到解决方案。我想念什么?
答案 0 :(得分:0)
要动态生成公式,您需要在as.formula
内强制转换公式的字符串版本:
t.test(as.formula(paste(f, "~ Vote_in_Election")), data=sample, var.equal = FALSE)
或使用reformulate
:
t.test(reformulate("Vote_in_Election", response=f), data=sample, var.equal = FALSE)