我试图读取一个制表符分隔的表,该表不断产生一些解析失败。我认为是由于在文本中使用了反斜杠。参见以下示例:
concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
2618087 Services delivered under an outpatient speech language pathology plan of care Observation HCPCS HCPCS Modifier S GN 19990101 20991231
2618083 "opt out" physician or practitioner emergency or urgent service Observation HCPCS HCPCS Modifier S GJ 19981001 20991231
2618082 Diagnostic mammogram converted from screening mammogram on same day Observation HCPCS HCPCS Modifier S GH 19981001 20991231
请注意第二个问题似乎源自的第二列中的“选择退出”。 以下代码具有解析失败:
df <- read_delim(
file = "~/_data/test.csv",
col_types = cols(
col_integer(), col_character(), col_character(),
col_character(), col_character(), col_character(),
col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
col_character()),
delim = "\t"
)
Warning: 4 parsing failures.
row col expected actual file
1 NA 10 columns 9 columns '~/_data/test.csv'
2 concept_name delimiter or quote '~/_data/test.csv'
2 concept_name closing quote at end of file '~/_data/test.csv'
2 NA 10 columns 2 columns '~/_data/test.csv'
我似乎无法指定解决方案。
答案 0 :(得分:0)
这可以解决问题。我需要将quote
的参数修改为quote = ""
df <- read_delim(
file = "~/_data/test.csv",
col_types = cols(
col_integer(), col_character(), col_character(),
col_character(), col_character(), col_character(),
col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
col_character()),
quote = "",
delim = "\t"
)