使用R的条件概率的列联表

时间:2016-10-26 14:38:54

标签: r

我有这些数据:http://www.unige.ch/ses/spo/static/simonhug/madi/Mitchell_et_al_1984.csv

> str(dataset)
'data.frame':   135 obs. of  13 variables:
 $ CCode             : int  2 20 40 41 42 51 52 70 90 91 ...
 $ StateAbb          : Factor w/ 130 levels "AFG","ALB","ALG",..: 124 19 28 52 33 62 117 75 49 53 ...
 $ StateNme          : Factor w/ 130 levels "Afghanistan",..: 122 20 27 51 33 62 116 76 47 52 ...
 $ prison_score      : Factor w/ 5 levels "never","often",..: 1 1 2 4 5 1 NA 4 5 4 ...
 $ torture_score     : Factor w/ 5 levels "never","often",..: 1 3 1 4 2 1 NA 2 5 2 ...
 $ ht_colonial       : Factor w/ 10 levels "0. Never colonized by a Western overseas colonial power",..: 1 1 4 8 4 7 7 4 4 4 ...
 $ british           : int  NA NA 0 0 0 1 1 0 0 0 ...
 $ british_colony    : Factor w/ 2 levels "no","yes": NA NA 1 1 1 2 2 1 1 1 ...
 $ continent         : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ region_wb         : Factor w/ 19 levels "Australia and New Zealand",..: 10 10 2 2 2 2 2 3 3 3 ...
 $ gdppc_l1          : num  25839 23550 10095 1846 4758 ...
 $ colonialExperience: chr  NA NA "Other Colonial Background" "Other Colonial Background" ...

并且必须创建类似的结果

enter image description here

使用此代码

# Copy the torture_score in a new col
dataset$torture_score_new = dataset$torture_score

# Add a level to the factor torture_score_new so we can t
levels(dataset$torture_score_new) = c(levels(dataset$torture_score_new), "rarely or never")

### Recode variables
# Torture score
dataset$torture_score_new[dataset$torture_score == "rarely"] = "rarely or never"
dataset$torture_score_new[dataset$torture_score == "never"] = "rarely or never"

dataset$torture_score_new = droplevels(dataset$torture_score_new)
dataset$torture_score_new = ordered(dataset$torture_score_new, levels =c("rarely or never", "somtimes", "often", "very often"))



### Text
dataset$colonialExperience = ifelse(dataset$british_colony == "yes",
                                    "Former British Colony",
                                    "Other Colonial Background")

useOfTortureByColonialExperience = table(dataset$torture_score_new, dataset$colonialExperience)

addmargins(round(prop.table(useOfTortureByColonialExperience)*100,2),1)

并获得此结果

                  Former British Colony Other Colonial Background
  rarely or never                  9.76                     20.73
  somtimes                        10.98                     15.85
  often                            6.10                     18.29
  very often                      10.98                      7.32
  Sum                             37.82                     62.19

但我不明白如何使用条件数据并获得Chi Square。

(我是程序员,但对R来说是一个新人)

1 个答案:

答案 0 :(得分:0)

好吧,我最终做了什么。

useOfTortureByColonialExperience = table(dataset$torture_score_new, dataset$colonialExperience)

# Get the number of observation
addmargins(useOfTortureByColonialExperience,1);

# Contingency table with conditional probability

useOfTortureByColonialExperienceProp = prop.table(useOfTortureByColonialExperience,2)

print(addmargins(useOfTortureByColonialExperienceProp*100,1),3)

## Chisq
chisq.test(useOfTortureByColonialExperience)

cramersV(useOfTortureByColonialExperience)