如何使用摘要统计信息重塑数据?

时间:2016-08-14 19:22:04

标签: r

我正在进行一项调查,并希望对我的数据进行重新整理,以便在没有手动创建字段的情况下提供特定群组对特定问题的回复的具体摘要统计信息。 (我将要处理的问题数量会使这种情况变得令人望而却步。)

目前,这就是我所拥有的:

library(reshape2)

test.0 <- read.csv("test.csv")

test.1 <- dcast(
    test.0,
    school+year~questionnaire+question+answer+answer.code
    )

test.2 <- melt.data.frame(
    test.1,
    id.vars = c(school, year),
    variable.names = c("questionnaire", "question", "answer", "answer.code"),
    factorsAsStrings = TRUE
    )

dcast函数会将数据重新整形为列,但我无法弄清楚要添加什么来让它进行计算并自动添加列。 (我以为我可以melt数据框,运行计算并重铸它,但是我在融化时遇到了问题。

对于每个问题,我想计算(简写,而不是代码):

  1. *每个答案选项的回复数量(如果学校= a,年= b,问题= c,答案= d,则计算)
  2. 回复总数(总和回答!=“DNA”)
  3. *每个答案选项的答案百分比(如果答案= d /总和答案,请计算!=“DNA)
  4. 响应率百分比(总和答案!= c(“NULL”,“DNA”)/总和答案!= c(“DNA”))
  5. 作为分数的平均回答(总和question.code / count answer!= c(“NULL”,“DNA”))
  6. *可能需要多个列,每个响应选项一个

    这是一个示例数据集(NULL =学生没有回答,DNA =我没有问过):

    studentid    year    school      questionnaire   question    answer      answer-code 
    517202  2018     High School    1    Do you like ice cream?      Yes    5
    553908  2017     High School    1    Do you like ice cream?      No     1
    424835  2019     High School    1    Do you like ice cream?      Yes    5
    471321  2024     Middle School  1    Do you like ice cream?      No     1
    458237  2021     Middle School  1    Do you like ice cream?      No     1
    300763  2024     Middle School  1    Do you like ice cream?      NULL    NULL 
    173314  2018     High School    1    Do you like ice cream?      NULL    NULL 
    924930  2023     Middle School  1    Do you like ice cream?      Yes    5
    902908  2019     High School    1    Do you like ice cream?      No     1
    227533  2018     High School    1    Do you like ice cream?      No     1
    517202  2018     High School    1    Do you like ice cream sundaes?      Yes    5
    553908  2017     High School    1    Do you like ice cream sundaes?      No     1
    424835  2019     High School    1    Do you like ice cream sundaes?      Yes    5
    471321  2024     Middle School  1    Do you like ice cream sundaes?      No     1
    458237  2021     Middle School  1    Do you like ice cream sundaes?      No     1
    300763  2024     Middle School  1    Do you like ice cream sundaes?      DNA     DNA 
    173314  2018     High School    1    Do you like ice cream sundaes?      DNA     DNA 
    924930  2023     Middle School  1    Do you like ice cream sundaes?      Yes    5
    902908  2019     High School    1    Do you like ice cream sundaes?      NULL    NULL 
    227533  2018     High School    1    Do you like ice cream sundaes?      No     1
    517202  2018     High School    2    The ice cream party made me like ice cream more.    Neither Agree nor Disagree     3
    553908  2017     High School    2    The ice cream party made me like ice cream more.    Agree  4
    424835  2019     High School    2    The ice cream party made me like ice cream more.    Disagree   2
    471321  2024     Middle School  2    The ice cream party made me like ice cream more.    Strongly Agree     5
    458237  2021     Middle School  2    The ice cream party made me like ice cream more.    Disagree   2
    300763  2024     Middle School  2    The ice cream party made me like ice cream more.    Agree  3
    173314  2018     High School    2    The ice cream party made me like ice cream more.    Strongly Disagree  1
    924930  2023     Middle School  2    The ice cream party made me like ice cream more.    Neither Agree nor Disagree     3
    902908  2019     High School    2    The ice cream party made me like ice cream more.    Agree  4
    227533  2018     High School    2    The ice cream party made me like ice cream more.    NULL    NULL 
    

    这是我想要最终得到的数据框(除了第9列中其他问题重复第2-8列):

    school | year | 1_Do you like ice cream?_#Yes | 1_Do you like ice cream?_#No | 1_Do you like ice cream?_#NULL | 1_Do you like ice cream?_#Total | 1_Do you like ice cream?_%Yes | 1_Do you like ice cream?_%No | 1_Do you like ice cream?_Response Rate | 1_Do you like ice cream?_Ice Cream Score**
    High School | 2017 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1
    High School | 2018 | 1 | 1 | 1 | 3 | 0.333333333 | 0.333333333 | 0.666666667 | 2
    High School | 2019 | 1 | 1 | 0 | 2 | 0.5 | 0.5 | 1 | 3
    High School | 2020 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
    Middle School | 2021 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 
    Middle School | 2022 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 
    Middle School | 2023 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 5 
    Middle School | 2024 | 0 | 1 | 1 | 2 | 0 | 0.5 | 0.5 | 1 
    

    有更简单的解决方案吗?我至少走在正确的轨道上吗?任何指导都会受到超级赞赏!

0 个答案:

没有答案