我目前正在使用CSV文件在R中导入多个数据集。此数据集包含超过16列的2500个变量。我正试图在R中用lm做回归函数 但是,当我尝试为年效应或行业效果制作虚拟变量时,回归将无效。
这就是我创建虚拟变量的方式:
CNAME <- factor(Combined.data[6], levels=c(1:20), labels= c("AUSTRIA", "BELGIUM", "DENMARK",
"FINLAND", "FRANCE", "GERMANY", "IRELAND", "ISLE OF MAN", "ITALY", "LUXEMBOURG",
"NETHERLANDS", "NORWAY", "POLAND", "PORTUGAL", "SPAIN", "SWEDEN", "SWITZERLAND",
"TURKEY", "UNITED KINGDOM", "UNITED STATES"))
这就是回归函数的样子:
results <- lm(Tax_Avoidance ~ ENVSCORE + CGVSCORE + SOCSCORE + ECNSCORE + Size +
Leverage + ROA + MTB + ROA + RND + AUD + PPE + Intang + CDP +
CHS + NET + CNAME,
data = finalresults)
summary(results)
我看不出我做错了什么,感谢你的帮助。
答案 0 :(得分:0)
Will this not work for you? Without knowing the error its difficult to know whats going wrong.
CNAME <- c("AUSTRIA", "BELGIUM", "DENMARK",
"FINLAND", "FRANCE", "GERMANY", "IRELAND", "ISLE OF MAN", "ITALY", "LUXEMBOURG",
"NETHERLANDS", "NORWAY", "POLAND", "PORTUGAL", "SPAIN", "SWEDEN", "SWITZERLAND",
"TURKEY", "UNITED KINGDOM", "UNITED STATES")
df <- data.frame(replicate(10,sample(0:50,20,rep=TRUE)))
df <- cbind(df, CNAME)
library(dummies)
df <- as.data.frame(df)
df <- dummy.data.frame(df)
results <- lm(X1 ~ ., data = df)
summary(results)
With the data.frame:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 CNAMEAUSTRIA CNAMEBELGIUM
1 41 27 3 28 6 3 35 19 3 34 1 0
2 41 41 30 22 15 42 44 42 6 41 0 1
3 13 1 26 35 44 22 13 11 46 47 0 0
CNAMEDENMARK CNAMEFINLAND CNAMEFRANCE CNAMEGERMANY CNAMEIRELAND
1 0 0 0 0 0
2 0 0 0 0 0
3 1 0 0 0 0
CNAMEISLE OF MAN CNAMEITALY CNAMELUXEMBOURG CNAMENETHERLANDS
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
CNAMENORWAY CNAMEPOLAND CNAMEPORTUGAL CNAMESPAIN CNAMESWEDEN
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
CNAMESWITZERLAND CNAMETURKEY CNAMEUNITED KINGDOM CNAMEUNITED STATES
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0