Question

我目前正在尝试读取.txt文件。

我在这里进行了研究，发现Error in reading in data set in R - 然而，它并没有解决我的问题。

这些数据是美国联邦选举委员会在ftp://ftp.fec.gov/FEC/2014/webk14.zip

列出的政治捐款

检查.txt后，我意识到数据结构奇特。特别是，任何一行的结尾都不会与下一行的第一个单元格分开（不是＆＃34; |＆＃34;而不是空格）。

奇怪的是，通过Excel和Access导入似乎工作正常。但是，R import不起作用。

为避免Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 90 did not have 27 elements错误，我使用以下命令：

webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", sep = "|", file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))

这不会导致错误，但是，结果a）具有与Excel导入不同的行数，b）无法正确分离列（这可能是a的原因））

我不想通过Excel绕道而行直接导入R.任何想法我做错了什么？

Answer 1

它可能与变量名称中的符号有关，因此使用comment.char=""来解释这些符号，它们为您提供：

webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", comment.char="",sep = "|",file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))

用不清晰的线端符号读表的问题

1 个答案: