R中的as.data.frame和read.csv之间的区别

时间:2017-09-25 08:55:14

标签: r read.csv

我想用R函数matchit进行倾向得分匹配,如果我从csv文件中读取数据,eveything看起来很好,结果就是我想要的:

> csv <- read.csv("C:/Users/Lenovo/Desktop/ddd.csv", header=TRUE)
> df <- as.data.frame(csv)
> df
   PERSON_ID OUTCOME tnb gxy AGE1
1     166920       1   2   0   61
2     167350       1   2   0   65
3     167757       1   1   0   58
4     167812       1   1   0   63
5     168271       1   2   0   55
6     168426       0   2   0   47
7     168652       0   2   1   57
8     168983       0   1   0   51
9     169083       0   2   0   50
10    169172       0   2   1   53
> fm <- matchit(OUTCOME ~ tnb + AGE1, data = df, method = "nearest")
> result <- summary(fm)
> result

Call:
matchit(formula = OUTCOME ~ tnb + AGE1, data = df, method = "nearest")

Summary of balance for all data:
         Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance        0.8334        0.1666     0.2575    0.6667   0.867   0.6667  0.8964
tnb             1.6000        1.8000     0.4472   -0.2000   0.000   0.2000  1.0000
AGE1           60.4000       51.6000     3.7148    8.8000   8.000   8.8000 10.0000


Summary of balance for matched data:
         Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance        0.8334        0.1666     0.2575    0.6667   0.867   0.6667  0.8964
tnb             1.6000        1.8000     0.4472   -0.2000   0.000   0.2000  1.0000
AGE1           60.4000       51.6000     3.7148    8.8000   8.000   8.8000 10.0000

Percent Balance Improvement:
         Mean Diff. eQQ Med eQQ Mean eQQ Max
distance          0       0        0       0
tnb               0       0        0       0
AGE1              0       0        0       0

Sample sizes:
          Control Treated
All             5       5
Matched         5       5
Unmatched       0       0
Discarded       0       0

但是如果我使用数组来保存输入数据,然后将它们转换为data.frame,结果矩阵有很多行,其行名称不是我定义的:

> OUTCOME<-c("1", "1", "1", "1", "1", "0", "0", "0", "0", "0");
> PERSON_ID<-c("166920", "167350", "167757", "167812", "168271", "168426", "168652", "168983", "169083", "169172");
> tnb<-c("0", "0", "1", "0", "1", "0", "0", "1", "1", "0");
> gxy<-c("0", "0", "1", "0", "0", "1", "0", "0", "1", "0");
> AGE1<-c("61", "65", "58", "63", "55", "47", "57", "51", "50", "53");
> matrix <- cbind(PERSON_ID,OUTCOME,tnb,gxy,AGE1)
> data <- as.data.frame(matrix, stringsAsFactors= TRUE)
> data
   PERSON_ID OUTCOME tnb gxy AGE1
1     166920       1   0   0   61
2     167350       1   0   0   65
3     167757       1   1   1   58
4     167812       1   0   0   63
5     168271       1   1   0   55
6     168426       0   0   1   47
7     168652       0   0   0   57
8     168983       0   1   0   51
9     169083       0   1   1   50
10    169172       0   0   0   53
> fm <- matchit(OUTCOME ~ tnb + gxy + AGE1, data = data, method = "nearest", replace = TRUE, ratio = 1)
> summary(fm)

Call:
matchit(formula = OUTCOME ~ tnb + gxy + AGE1, data = data, method = "nearest", 
    replace = TRUE, ratio = 1)

Summary of balance for all data:
         Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance           1.0           0.0     0.0000       1.0       1      1.0       1
tnb0               0.6           0.6     0.5477       0.0       0      0.0       0
tnb1               0.4           0.4     0.5477       0.0       0      0.0       0
gxy1               0.2           0.4     0.5477      -0.2       0      0.2       1
AGE150             0.0           0.2     0.4472      -0.2       0      0.2       1
AGE151             0.0           0.2     0.4472      -0.2       0      0.2       1
AGE153             0.0           0.2     0.4472      -0.2       0      0.2       1
AGE155             0.2           0.0     0.0000       0.2       0      0.2       1
AGE157             0.0           0.2     0.4472      -0.2       0      0.2       1
AGE158             0.2           0.0     0.0000       0.2       0      0.2       1
AGE161             0.2           0.0     0.0000       0.2       0      0.2       1
AGE163             0.2           0.0     0.0000       0.2       0      0.2       1
AGE165             0.2           0.0     0.0000       0.2       0      0.2       1


Summary of balance for matched data:
         Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
distance           1.0           0.0     0.0000       1.0     1.0      1.0       1
tnb0               0.6           0.8     0.5657      -0.2     0.0      0.0       0
tnb1               0.4           0.2     0.5657       0.2     0.0      0.0       0
gxy1               0.2           0.8     0.5657      -0.6     0.0      0.0       0
AGE150             0.0           0.0     0.0000       0.0     0.0      0.0       0
AGE151             0.0           0.2     0.5657      -0.2     0.5      0.5       1
AGE153             0.0           0.0     0.0000       0.0     0.0      0.0       0
AGE155             0.2           0.0     0.0000       0.2     0.5      0.5       1
AGE157             0.0           0.0     0.0000       0.0     0.0      0.0       0
AGE158             0.2           0.0     0.0000       0.2     0.5      0.5       1
AGE161             0.2           0.0     0.0000       0.2     0.5      0.5       1
AGE163             0.2           0.0     0.0000       0.2     0.5      0.5       1
AGE165             0.2           0.0     0.0000       0.2     0.5      0.5       1

Percent Balance Improvement:
         Mean Diff. eQQ Med eQQ Mean eQQ Max
distance          0       0        0       0
tnb0           -Inf       0        0       0
tnb1           -Inf       0        0       0
gxy1           -200       0      100     100
AGE150          100       0      100     100
AGE151            0    -Inf     -150       0
AGE153          100       0      100     100
AGE155            0    -Inf     -150       0
AGE157          100       0      100     100
AGE158            0    -Inf     -150       0
AGE161            0    -Inf     -150       0
AGE163            0    -Inf     -150       0
AGE165            0    -Inf     -150       0

Sample sizes:
          Control Treated
All             5       5
Matched         2       5
Unmatched       3       0
Discarded       0       0

我的问题是:read.csv返回一个数据框,as.data.frame(x)也返回一个数据框,为什么R&lt; s的matchit输出结果不同?

1 个答案:

答案 0 :(得分:0)

“我的问题是:read.csv返回一个数据框,as.data.frame(x)也返回一个数据框,为什么R的matchit输出结果不同?”

当你使用read.csv时,你的数值数据可能会被读入,而matchit会将它们视为数字。但是当你将变量声明为字符时:

AGE1<-c("61", "65", "58", "63", "55", "47", "57", "51", "50", "53")

代替数字:

AGE1<-c(61, 65, 58, 63, 55, 47, 57, 51, 50, 53)

matchit会将其视为分类。

正在运行str(data)str(df)会显示出这种差异。