答案 0 :(得分:2)
要读取其中的数据,可以使用read.fwf
基本方法。
如评论中所述,您可以从SPSS语法中获得一致性:https://www.cdc.gov/healthyyouth/data/yrbs/sadc_2017/2017_sadc_spss_input_program.sps
我已经使用文本编辑器快速获取列宽:
vec <- c(5, 50, 50, 8, 8, 3, 10, 8, 8, 8, 3, 3, 3, 3, 3, 8, 8, 8, 8,
3, 3, 1, 1, 8, 8, 8, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3)
每列/变量的对应名称:
names <- c("sitecode", "sitename", "sitetype", "sitetypenum", "year",
"survyear", "weight", "stratum", "PSU", "record", "age", "sex",
"grade", "race4", "race7", "stheight", "stweight", "bmi", "bmipct",
"qnobese", "qnowt", "q67", "q66", "sexid", "sexid2", "sexpart",
"sexpart2", "Q8", "Q9", "Q10", "Q11", "Q12", "Q13", "Q14", "Q15",
"Q16", "Q17", "Q18", "Q19", "Q20", "Q21", "Q22", "Q23", "Q24",
"Q25", "Q26", "Q27", "Q28", "Q29", "Q30", "Q31", "Q32", "Q33",
"Q34", "Q35", "Q36", "Q37", "Q38", "Q39", "Q40", "Q41", "Q42",
"Q43", "Q44", "Q45", "Q46", "Q47", "Q48", "Q49", "Q50", "Q51",
"Q52", "Q53", "Q54", "Q55", "Q56", "Q57", "Q58", "Q59", "Q60",
"Q61", "Q62", "Q63", "Q64", "Q65", "Q68", "Q69", "Q70", "Q71",
"Q72", "Q73", "Q74", "Q75", "Q76", "Q77", "Q78", "Q79", "Q80",
"Q81", "Q82", "Q83", "Q84", "Q85", "Q86", "Q87", "Q88", "Q89",
"QN8", "QN9", "QN10", "QN11", "QN12", "QN13", "QN14", "QN15",
"QN16", "QN17", "QN18", "QN19", "QN20", "QN21", "QN22", "QN23",
"QN24", "QN25", "QN26", "QN27", "QN28", "QN29", "QN30", "QN31",
"QN32", "QN33", "QN34", "QN35", "QN36", "QN37", "QN38", "QN39",
"QN40", "QN41", "QN42", "QN43", "QN44", "QN45", "QN46", "QN47",
"QN48", "QN49", "QN50", "QN51", "QN52", "QN53", "QN54", "QN55",
"QN56", "QN57", "QN58", "QN59", "QN60", "QN61", "QN62", "QN63",
"QN64", "QN65", "QN68", "QN69", "QN70", "QN71", "QN72", "QN73",
"QN74", "QN75", "QN76", "QN77", "QN78", "QN79", "QN80", "QN81",
"QN82", "QN83", "QN84", "QN85", "QN86", "QN87", "QN88", "QN89",
"qnfrcig", "qndaycig", "qnfrevp", "qndayevp", "qnfrskl", "qndayskl",
"qnfrcgr", "qndaycgr", "qntb2", "qntb3", "qntb4", "qniudimp",
"qnshparg", "qnothhpl", "qndualbc", "qnbcnone", "qnfr0", "qnfr1",
"qnfr2", "qnfr3", "qnveg0", "qnveg1", "qnveg2", "qnveg3", "qnsoda1",
"qnsoda2", "qnsoda3", "qnmilk1", "qnmilk2", "qnmilk3", "qnbk7day",
"qnpa0day", "qnpa7day", "qndlype", "qnnodnt", "qbikehelmet",
"qdrivemarijuana", "qcelldriving", "qpropertydamage", "qbullyweight",
"qbullygender", "qbullygay", "qchokeself", "qcigschool", "qchewtobschool",
"qalcoholschool", "qtypealcohol", "qhowmarijuana", "qmarijuanaschool",
"qcurrentcocaine", "qcurrentheroin", "qcurrentmeth", "qhallucdrug",
"qprescription30d", "qgenderexp", "qtaughtHIV", "qtaughtsexed",
"qtaughtstd", "qtaughtcondom", "qtaughtbc", "qdietpop", "qcoffeetea",
"qsportsdrink", "qenergydrink", "qsugardrink", "qwater", "qfastfood",
"qfoodallergy", "qwenthungry", "qmusclestrength", "qsunscreenuse",
"qindoortanning", "qsunburn", "qconcentrating", "qcurrentasthma",
"qwheresleep", "qspeakenglish", "qtransgender", "qnbikehelmet",
"qndrivemarijuana", "qncelldriving", "qnpropertydamage", "qnbullyweight",
"qnbullygender", "qnbullygay", "qnchokeself", "qncigschool",
"qnchewtobschool", "qnalcoholschool", "qntypealcohol", "qnhowmarijuana",
"qnmarijuanaschool", "qncurrentcocaine", "qncurrentheroin", "qncurrentmeth",
"qnhallucdrug", "qnprescription30d", "qngenderexp", "qntaughtHIV",
"qntaughtsexed", "qntaughtstd", "qntaughtcondom", "qntaughtbc",
"qndietpop", "qncoffeetea", "qnsportsdrink", "qnspdrk1", "qnspdrk2",
"qnspdrk3", "qnenergydrink", "qnsugardrink", "qnwater", "qnwater1",
"qnwater2", "qnwater3", "qnfastfood", "qnfoodallergy", "qnwenthungry",
"qnmusclestrength", "qnsunscreenuse", "qnindoortanning", "qnsunburn",
"qnconcentrating", "qncurrentasthma", "qnwheresleep", "qnspeakenglish",
"qntransgender")
如先前的评论中所述,我们可以使用read.fwf
方法来读取带有* .dat文件的固定文件(我只保存了一个子集...我希望读取它需要一些时间。整个文件):
df <- read.fwf(file = "c:/temp/file", widths = vec)
# Rename columns
names(df) <- names
# Inspect the head.
head(df, n=2)
# sitecode sitename sitetype sitetypenum year survyear weight stratum PSU record age sex grade race4 race7
# 1 XX United States (XX) National 3 1991 1 0.2645 12210 5 29890 . . 1 3 4
# 2 XX United States (XX) National 3 1991 1 0.5060 12310 29 29891 . . . . .
# stheight stweight bmi bmipct qnobese qnowt q67 q66 sexid sexid2 sexpart sexpart2 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32 Q33
# 1 . . . . . . NA NA . . . . 2 4 NA NA 4 NA NA NA NA 3 NA NA NA NA NA NA NA NA 2 2 1 1 1 NA 2 4
# 2 . . . . . . NA NA . . . . NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA 1 1 1 1 1 NA 1 1
# Q34 Q35 Q36 Q37 Q38 Q39 Q40 Q41 Q42 Q43 Q44 Q45 Q46 Q47 Q48 Q49 Q50 Q51 Q52 Q53 Q54 Q55 Q56 Q57 Q58 Q59 Q60 Q61 Q62 Q63 Q64 Q65 Q68 Q69 Q70 Q71 Q72 Q73 Q74 Q75 Q76 Q77 Q78 Q79 Q80 Q81 Q82 Q83 Q84
# 1 NA NA NA NA NA NA 4 4 3 NA NA NA 5 5 5 1 NA NA NA NA NA 1 NA NA 1 1 5 4 4 3 3 8 1 NA NA NA NA NA NA NA NA NA NA NA NA NA 6 NA NA
# 2 NA NA NA NA NA NA 6 2 2 NA NA NA 1 1 1 1 NA NA NA NA NA 1 NA NA 1 1 2 2 2 3 3 2 3 NA NA NA NA NA NA NA NA NA NA NA NA NA 6 NA NA
# Q85 Q86 Q87 Q88 Q89 QN8 QN9 QN10 QN11 QN12 QN13 QN14 QN15 QN16 QN17 QN18 QN19 QN20 QN21 QN22 QN23 QN24 QN25 QN26 QN27 QN28 QN29 QN30 QN31 QN32 QN33 QN34 QN35 QN36 QN37 QN38 QN39 QN40 QN41 QN42
# 1 NA NA NA NA NA 1 1 . . 1 . . . . 1 . . . . . . . . 2 2 2 2 1 . 1 2 . . . . . . 1 1 1
# 2 NA NA NA NA NA . . . . . . . . . 2 . . . . . . . . 1 1 2 2 1 . 2 . . . . . . . 1 1 1
# QN43 QN44 QN45 QN46 QN47 QN48 QN49 QN50 QN51 QN52 QN53 QN54 QN55 QN56 QN57 QN58 QN59 QN60 QN61 QN62 QN63 QN64 QN65 QN68 QN69 QN70 QN71 QN72 QN73 QN74 QN75 QN76 QN77 QN78 QN79 QN80 QN81 QN82 QN83
# 1 . . . 1 2 1 2 . . . . . 2 . . 1 1 2 2 1 2 2 2 2 . . . . . . . . . . . . . 1 .
# 2 . . . 2 2 2 2 . . . . . 2 . . 1 1 1 2 2 . . . 2 . . . . . . . . . . . . . 1 .
# QN84 QN85 QN86 QN87 QN88 QN89 qnfrcig qndaycig qnfrevp qndayevp qnfrskl qndayskl qnfrcgr qndaycgr qntb2 qntb3 qntb4 qniudimp qnshparg qnothhpl qndualbc qnbcnone qnfr0 qnfr1 qnfr2 qnfr3 qnveg0
# 1 . . . . . . 2 2 . . . . . . . . . . . . . 2 . . . . .
# 2 . . . . . . 2 2 . . . . . . . . . . . . . . . . . . .
# qnveg1 qnveg2 qnveg3 qnsoda1 qnsoda2 qnsoda3 qnmilk1 qnmilk2 qnmilk3 qnbk7day qnpa0day qnpa7day qndlype qnnodnt qbikehelmet qdrivemarijuana qcelldriving qpropertydamage qbullyweight qbullygender
# 1 . . . . . . . . . . . . 1 . 2 NA NA NA NA NA
# 2 . . . . . . . . . . . . 1 . NA NA NA NA NA NA
# qbullygay qchokeself qcigschool qchewtobschool qalcoholschool qtypealcohol qhowmarijuana qmarijuanaschool qcurrentcocaine qcurrentheroin qcurrentmeth qhallucdrug qprescription30d qgenderexp
# 1 NA NA NA NA NA NA NA NA 1 NA NA NA NA NA
# 2 NA NA NA NA NA NA NA NA 1 NA NA NA NA NA
# qtaughtHIV qtaughtsexed qtaughtstd qtaughtcondom qtaughtbc qdietpop qcoffeetea qsportsdrink qenergydrink qsugardrink qwater qfastfood qfoodallergy qwenthungry qmusclestrength qsunscreenuse
# 1 2 NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA
# 2 1 NA NA NA NA NA NA NA NA NA NA NA NA NA 2 NA
# qindoortanning qsunburn qconcentrating qcurrentasthma qwheresleep qspeakenglish qtransgender qnbikehelmet qndrivemarijuana qncelldriving qnpropertydamage qnbullyweight qnbullygender qnbullygay
# 1 NA NA NA NA NA NA NA 1 . . . . . .
# 2 NA NA NA NA NA NA NA . . . . . . .
# qnchokeself qncigschool qnchewtobschool qnalcoholschool qntypealcohol qnhowmarijuana qnmarijuanaschool qncurrentcocaine qncurrentheroin qncurrentmeth qnhallucdrug qnprescription30d qngenderexp
# 1 . . . . . . . 2 . . . . .
# 2 . . . . . . . 2 . . . . .
# qntaughtHIV qntaughtsexed qntaughtstd qntaughtcondom qntaughtbc qndietpop qncoffeetea qnsportsdrink qnspdrk1 qnspdrk2 qnspdrk3 qnenergydrink qnsugardrink qnwater qnwater1 qnwater2 qnwater3
# 1 2 . . . . . . . . . . . . . . . .
# 2 1 . . . . . . . . . . . . . . . .
# qnfastfood qnfoodallergy qnwenthungry qnmusclestrength qnsunscreenuse qnindoortanning qnsunburn qnconcentrating qncurrentasthma qnwheresleep qnspeakenglish qntransgender
# 1 . . . 2 . . . . . . . .
# 2 . . . 2 . . . . . . . .
请注意,可能需要修剪任何字符列。缺失的也是“。”因此,您可能也希望将其删除。
答案 1 :(得分:1)
尽管我无法完全回答您的问题,但可以帮助您入门。您不确定该怎么做的原因是因为数据不是按照您习惯的方式进行格式化的。数据为ASCII格式。网站上的内容如下:
“注意:需要使用SAS和SPSS程序将ASCII转换为SAS和SPSS数据集。如何使用ASCII数据从一个软件包到另一个软件包都不同。通常必须指定每个变量的列位置。列位置有关每个变量的信息,可以在年度数据的文档中找到。有关更多信息,请查阅软件文档。“
ASCII只是一种不同的存储数据的方式,例如.csv或其他格式,但是它不如将其全部存储在列中那样可读。您可以开始但要搜索如何将ASCII数据导入R并从那里去。抱歉,我需要更多帮助。