我目前正在尝试读取.txt文件。
我在这里进行了研究,发现Error in reading in data set in R - 然而,它并没有解决我的问题。
这些数据是美国联邦选举委员会在ftp://ftp.fec.gov/FEC/2014/webk14.zip
列出的政治捐款检查.txt后,我意识到数据结构奇特。特别是,任何一行的结尾都不会与下一行的第一个单元格分开(不是" |"而不是空格)。
奇怪的是,通过Excel和Access导入似乎工作正常。但是,R import不起作用。
为避免Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 90 did not have 27 elements
错误,我使用以下命令:
webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", sep = "|", file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))
这不会导致错误,但是,结果a)具有与Excel导入不同的行数,b)无法正确分离列(这可能是a的原因))
我不想通过Excel绕道而行直接导入R.任何想法我做错了什么?
答案 0 :(得分:0)
它可能与变量名称中的符号有关,因此使用comment.char=""
来解释这些符号,它们为您提供:
webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", comment.char="",sep = "|",file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))