用不清晰的线端符号读表的问题

时间:2017-07-26 09:27:21

标签: r

我目前正在尝试读取.txt文件。

我在这里进行了研究,发现Error in reading in data set in R - 然而,它并没有解决我的问题。

这些数据是美国联邦选举委员会在ftp://ftp.fec.gov/FEC/2014/webk14.zip

列出的政治捐款

检查.txt后,我意识到数据结构奇特。特别是,任何一行的结尾都不会与下一行的第一个单元格分开(不是" |"而不是空格)。

奇怪的是,通过Excel和Access导入似乎工作正常。但是,R import不起作用。

为避免Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 90 did not have 27 elements错误,我使用以下命令:

webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", sep = "|", file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))

这不会导致错误,但是,结果a)具有与Excel导入不同的行数,b)无法正确分离列(这可能是a的原因))

我不想通过Excel绕道而行直接导入R.任何想法我做错了什么?

1 个答案:

答案 0 :(得分:0)

它可能与变量名称中的符号有关,因此使用comment.char=""来解释这些符号,它们为您提供:

webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", comment.char="",sep = "|",file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))