Question

我想分析两个变量来测试数据集之间的相关性。其中一个变量是“string”，另一个变量是“date”（这是一段时间）。据我所知，对于我的建议，适当的测试应该是“Fisher精确测试”。

由于某些类别中有很多0，因此无法进行卡方检验。我正在考虑运行Fisher的精确测试，但不知道是怎么回事，因为我是R的新手。

数据样本：

  **Parking locations**           `Time sequence`
        Other locations             9:30-13:00
        Bicycle shed (Ground floor) 17:00-20:00
        Bicycle parking (East side) 6:00-9:30
        Bicycle shed (Ground floor) 13:00-17:00
        Bicycle shed (First floor)  9:30-13:00
        Bicycle shed (First floor)  13:00-17:00
        Bicycle shed (Ground floor) 13:00-17:00
        Bicycle shed (Ground floor) 13:00-17:00
        Supervised bicycle parking  6:00-9:30
        Bicycle shed (Ground floor) 6:00-9:30

我的问题是要知道是否可以在Spss中运行分析，或者我应该使用R。此外，Time sequence列的数据类型应该是什么时间段（9:30到13:00）？

Answer 1

如果我是你，我会确保您的数据采用逗号分隔格式（csv）。这样，您可以使用read.csv简单地将数据读入R中。

如果您想将它们用作分类变量，只需使用R：

即可

fisher.test(parking_location, time_sequence)

随着更多具体信息的出现，我会相应地更新答案;这适用于字符串（例如Bicycle shed (First floor)和Bicycle shed (Ground floor)）是唯一的，并且它认为间隔也是固定的。

Answer 2

我将您的数据输入csv file.（备注：由于对齐的第二列，您的数据看起来与标题分开，这也会起作用）

然后你可以在R：

中做到这一点

data=read.csv("~/bikes.csv", header=T)
t<-table(data)
fisher.test(t)

在this screenshot.

中可以看到t的内容和费希尔测试的结果

这是复制的输出：

> t
                         Time.sequence
Parking.locations             13:00-17:00 17:00-20:00 6:00-9:30 9:30-13:00
  Bicycle parking (East side)           0           0         1          0
  Bicycle shed (First floor)            1           0         0          1
  Bicycle shed (Ground floor)           3           1         1          0
  Other locations                       0           0         0          1
  Supervised bicycle parking            0           0         1          0
> fisher.test(t)

    Fisher's Exact Test for Count Data

data:  t 
p-value = 0.419
alternative hypothesis: two.sided

这是命令的一个非常基本的例子

?fisher.test

您可以看到大于2 x 2的表格有一些设置。如果我的任何假设是错误的（例如Parking.locations的分离），我会更新我的答案。

Fisher的精确测试分析和分析时间段

2 个答案: