我想用R函数matchit进行倾向得分匹配,如果我从csv文件中读取数据,eveything看起来很好,结果就是我想要的:
> csv <- read.csv("C:/Users/Lenovo/Desktop/ddd.csv", header=TRUE)
> df <- as.data.frame(csv)
> df
PERSON_ID OUTCOME tnb gxy AGE1
1 166920 1 2 0 61
2 167350 1 2 0 65
3 167757 1 1 0 58
4 167812 1 1 0 63
5 168271 1 2 0 55
6 168426 0 2 0 47
7 168652 0 2 1 57
8 168983 0 1 0 51
9 169083 0 2 0 50
10 169172 0 2 1 53
> fm <- matchit(OUTCOME ~ tnb + AGE1, data = df, method = "nearest")
> result <- summary(fm)
> result
Call:
matchit(formula = OUTCOME ~ tnb + AGE1, data = df, method = "nearest")
Summary of balance for all data:
Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance 0.8334 0.1666 0.2575 0.6667 0.867 0.6667 0.8964
tnb 1.6000 1.8000 0.4472 -0.2000 0.000 0.2000 1.0000
AGE1 60.4000 51.6000 3.7148 8.8000 8.000 8.8000 10.0000
Summary of balance for matched data:
Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance 0.8334 0.1666 0.2575 0.6667 0.867 0.6667 0.8964
tnb 1.6000 1.8000 0.4472 -0.2000 0.000 0.2000 1.0000
AGE1 60.4000 51.6000 3.7148 8.8000 8.000 8.8000 10.0000
Percent Balance Improvement:
Mean Diff. eQQ Med eQQ Mean eQQ Max
distance 0 0 0 0
tnb 0 0 0 0
AGE1 0 0 0 0
Sample sizes:
Control Treated
All 5 5
Matched 5 5
Unmatched 0 0
Discarded 0 0
但是如果我使用数组来保存输入数据,然后将它们转换为data.frame,结果矩阵有很多行,其行名称不是我定义的:
> OUTCOME<-c("1", "1", "1", "1", "1", "0", "0", "0", "0", "0");
> PERSON_ID<-c("166920", "167350", "167757", "167812", "168271", "168426", "168652", "168983", "169083", "169172");
> tnb<-c("0", "0", "1", "0", "1", "0", "0", "1", "1", "0");
> gxy<-c("0", "0", "1", "0", "0", "1", "0", "0", "1", "0");
> AGE1<-c("61", "65", "58", "63", "55", "47", "57", "51", "50", "53");
> matrix <- cbind(PERSON_ID,OUTCOME,tnb,gxy,AGE1)
> data <- as.data.frame(matrix, stringsAsFactors= TRUE)
> data
PERSON_ID OUTCOME tnb gxy AGE1
1 166920 1 0 0 61
2 167350 1 0 0 65
3 167757 1 1 1 58
4 167812 1 0 0 63
5 168271 1 1 0 55
6 168426 0 0 1 47
7 168652 0 0 0 57
8 168983 0 1 0 51
9 169083 0 1 1 50
10 169172 0 0 0 53
> fm <- matchit(OUTCOME ~ tnb + gxy + AGE1, data = data, method = "nearest", replace = TRUE, ratio = 1)
> summary(fm)
Call:
matchit(formula = OUTCOME ~ tnb + gxy + AGE1, data = data, method = "nearest",
replace = TRUE, ratio = 1)
Summary of balance for all data:
Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance 1.0 0.0 0.0000 1.0 1 1.0 1
tnb0 0.6 0.6 0.5477 0.0 0 0.0 0
tnb1 0.4 0.4 0.5477 0.0 0 0.0 0
gxy1 0.2 0.4 0.5477 -0.2 0 0.2 1
AGE150 0.0 0.2 0.4472 -0.2 0 0.2 1
AGE151 0.0 0.2 0.4472 -0.2 0 0.2 1
AGE153 0.0 0.2 0.4472 -0.2 0 0.2 1
AGE155 0.2 0.0 0.0000 0.2 0 0.2 1
AGE157 0.0 0.2 0.4472 -0.2 0 0.2 1
AGE158 0.2 0.0 0.0000 0.2 0 0.2 1
AGE161 0.2 0.0 0.0000 0.2 0 0.2 1
AGE163 0.2 0.0 0.0000 0.2 0 0.2 1
AGE165 0.2 0.0 0.0000 0.2 0 0.2 1
Summary of balance for matched data:
Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance 1.0 0.0 0.0000 1.0 1.0 1.0 1
tnb0 0.6 0.8 0.5657 -0.2 0.0 0.0 0
tnb1 0.4 0.2 0.5657 0.2 0.0 0.0 0
gxy1 0.2 0.8 0.5657 -0.6 0.0 0.0 0
AGE150 0.0 0.0 0.0000 0.0 0.0 0.0 0
AGE151 0.0 0.2 0.5657 -0.2 0.5 0.5 1
AGE153 0.0 0.0 0.0000 0.0 0.0 0.0 0
AGE155 0.2 0.0 0.0000 0.2 0.5 0.5 1
AGE157 0.0 0.0 0.0000 0.0 0.0 0.0 0
AGE158 0.2 0.0 0.0000 0.2 0.5 0.5 1
AGE161 0.2 0.0 0.0000 0.2 0.5 0.5 1
AGE163 0.2 0.0 0.0000 0.2 0.5 0.5 1
AGE165 0.2 0.0 0.0000 0.2 0.5 0.5 1
Percent Balance Improvement:
Mean Diff. eQQ Med eQQ Mean eQQ Max
distance 0 0 0 0
tnb0 -Inf 0 0 0
tnb1 -Inf 0 0 0
gxy1 -200 0 100 100
AGE150 100 0 100 100
AGE151 0 -Inf -150 0
AGE153 100 0 100 100
AGE155 0 -Inf -150 0
AGE157 100 0 100 100
AGE158 0 -Inf -150 0
AGE161 0 -Inf -150 0
AGE163 0 -Inf -150 0
AGE165 0 -Inf -150 0
Sample sizes:
Control Treated
All 5 5
Matched 2 5
Unmatched 3 0
Discarded 0 0
我的问题是:read.csv返回一个数据框,as.data.frame(x)也返回一个数据框,为什么R&lt; s的matchit输出结果不同?
答案 0 :(得分:0)
“我的问题是:read.csv返回一个数据框,as.data.frame(x)也返回一个数据框,为什么R的matchit输出结果不同?”
当你使用read.csv时,你的数值数据可能会被读入,而matchit会将它们视为数字。但是当你将变量声明为字符时:
AGE1<-c("61", "65", "58", "63", "55", "47", "57", "51", "50", "53")
代替数字:
AGE1<-c(61, 65, 58, 63, 55, 47, 57, 51, 50, 53)
matchit
会将其视为分类。
正在运行str(data)
和str(df)
会显示出这种差异。