我遇到以下情况,其中TSQL
的过程有点慢,我正在R
中寻找可能的下垂度:
I want to do a cross tabulation over example table:
surveyID QuestionID AnswerID
1000 1 1
1000 2 3
1000 3 2
1000 4 1
1001 1 3
1001 2 2
1001 3 1
1001 4 3
并获得如下结果:
QuestionIDx QuestionIDy AnswerIDx AnswerIDy Frequancy
1 1 1 2 x
1 1 1 3 x
1 1 2 3 x
.....
我基本上是在surveyID
上自行离开表,然后使用R的函数table
或xtabs
来获取频率。
INSERT INTO #CrossTabResults([ProtoQuestionIDx], [ProtoQuestionIDy], [AnswerPosIDx], [AnswerPosIDy], [Frequency])
EXECUTE sp_execute_external_script
@language =N'R',
@script=N'
OutputDataSet <- data.frame(table(InputDataSet$ProtoQuestionIDx, InputDataSet$ProtoQuestionIDy, InputDataSet$AnswerPosIDx, InputDataSet$AnswerPosIDy))
', @input_data_1 = N'SELECT [SurveyInstanceID], [ProtoQuestionIDx], [ProtoQuestionIDy], [AnswerPosIDx], [AnswerPosIDy] FROM #JoinedSurveys'
OR
INSERT INTO #CrossTabResults([ProtoQuestionIDx], [ProtoQuestionIDy], [AnswerPosIDx], [AnswerPosIDy], [Frequency])
EXECUTE sp_execute_external_script
@language =N'R',
@script=N'
OutputDataSet <- data.frame(xtabs(~ ProtoQuestionIDx + ProtoQuestionIDy + AnswerPosIDx + AnswerPosIDy, data=InputDataSet))
', @input_data_1 = N'SELECT [SurveyInstanceID], [ProtoQuestionIDx], [ProtoQuestionIDy], [AnswerPosIDx], [AnswerPosIDy] FROM #JoinedSurveys'
我不确定这是正确的方法,也不会提高执行速度。我正在寻找最快的解决方案。任何帮助表示赞赏。