SPSS 22.对数据进行排序

时间:2014-05-14 15:18:13

标签: sorting statistics spss

在向75个人提出16个问题(是或否)后,我有一个他们的答案表,如00110011110101010('0'=否和'1'=是)。 现在我想找到以最相似的方式回答的人群(理想情况下,找到所有相似的组,从16/16到12/16)。

如何在SPSS中执行此操作?

提前致谢

1 个答案:

答案 0 :(得分:1)

我将回答如何复制您在SPSS中提到的特定情节 - 但当然还有其他类型的分析可以继续进行。所以这里有一个冗长的代码来为75个人创建一组二进制响应,它们来自3个不同的组。

*************************************.
*Making fake data with 3 groups.
SET SEED 10.
INPUT PROGRAM.
LOOP Case = 1 TO 75.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS Case (F2.0).

*Making a mixture of 3 groups.
COMPUTE Group = TRUNC(RV.UNIFORM(1,4)).
SORT CASES BY Group.
*Making a set of random data.
VECTOR Bin(16,F1.0).
VECTOR QP(16).
DO REPEAT QP = QP1 TO QP16.
DO IF $casenum = 1 OR (Group <> LAG(Group)).
  COMPUTE QP = RV.UNIFORM(0,1).
ELSE.
  COMPUTE QP = LAG(QP).
END IF.
END REPEAT.
DO REPEAT Bin = Bin1 TO Bin16 /QP = QP1 TO QP16.
  COMPUTE Bin = RV.BERNOULLI(QP).
END REPEAT.
MATCH FILES FILE = * /DROP Group QP1 TO QP16.
SORT CASES BY Case.
EXECUTE.
*************************************.

现在,为了制作我们的情节,我们将使用VARSTOCASES将数据从宽到长重新整形。

*Reshaping and then plotting in a matrix.
VARSTOCASES
  /MAKE Bin FROM Bin1 TO Bin16
  /INDEX Question.
VALUE LABELS Bin
  0 'No'
  1 'Yes'.

现在我们的数据格式很长,我们可以制作binary heat map

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Question Case Bin
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Question=col(source(s), name("Question"), unit.category())
  DATA: Case=col(source(s), name("Case"), unit.category())
  DATA: Bin=col(source(s), name("Bin"), unit.category())
  GUIDE: axis(dim(1), label("Question"))
  GUIDE: axis(dim(2), label("Case"))
  GUIDE: legend(aesthetic(aesthetic.color.interior))
  GUIDE: text.title(label("Default Ordering"))
  SCALE: cat(aesthetic(aesthetic.color.interior), map(("0",color.white),("1",color.darkgrey)))
  ELEMENT: polygon(position(Question*Case), color.interior(Bin))
END GPL.

产生这个情节:

Default Binary Heat Map

现在,通过一些简单的排序,情节通常会提供更多信息(如CV文章中所述)。我在下面展示的是简单地将案例和问题的分数按“是”答案的平均数进行排序。它产生了一个更有序的图表 - 但我的三个组仍然没有明显可见。

*Sorting by number answered yes per person and then by yes per question.
AGGREGATE OUTFILE = * MODE = ADDVARIABLES
  /BREAK = Case
  /MeanC = MEAN(Bin).
SORT CASES BY MeanC.
AGGREGATE OUTFILE = * MODE = ADDVARIABLES
  /BREAK = Question
  /MeanQ = MEAN(Bin).
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Question Case Bin MeanQ
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Question=col(source(s), name("Question"), unit.category())
  DATA: Case=col(source(s), name("Case"), unit.category())
  DATA: Bin=col(source(s), name("Bin"), unit.category())
  DATA: MeanQ=col(source(s), name("MeanQ"))
  GUIDE: axis(dim(1), label("Question"))
  GUIDE: axis(dim(2), label("Case"))
  GUIDE: legend(aesthetic(aesthetic.color.interior))
  GUIDE: text.title(label("Order By Mean Number per Case and Question"))
  SCALE: cat(dim(1), sort.statistic(summary.mean(MeanQ)), reverse())
  SCALE: cat(dim(2), sort.data())
  SCALE: cat(aesthetic(aesthetic.color.interior), map(("0",color.white),("1",color.darkgrey)))
  ELEMENT: polygon(position(Question*Case), color.interior(Bin))
END GPL.

Ordered Map