Question

事先为这个简单的问题道歉。我在读取制表符分隔的文件时遇到麻烦。 R认为第164行缺少元素，但我看不出原因。当我复制并粘贴到Excel中时，它可以很好地分开。

数据：

  temp <- tempfile()
  download.file("https://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM527389.zip",temp)

我尝试过

df <- read.table(unz(temp, "Products.txt"), sep="\t",header= TRUE)

和

 df <- read.table(unz(temp, "Products.txt"), sep="\t",fill=TRUE, header= TRUE)

在同一行上搞砸了。

Answer 1

请考虑像Jest > 23.x.x这样的read.delim，它是内置read.csv包中更通用的read.table函数的包装器。

似乎较长的字段 DrugName 和 ActiveIngredient 出现引号和空行问题，需要 fill ，引用， comment_char 参数进行调整。

utils

具有结构输出：

df <- read.delim(unz(temp, "Products.txt"), sep="\t", header= TRUE)

等效于str(df) # 'data.frame': 37850 obs. of 8 variables: # $ ApplNo : int 4 159 552 552 552 552 552 552 552 552 ... # $ ProductNo : num 4 1 1 2 3 4 5 7 8 9 ... # $ Form : Factor w/ 348 levels "AEROSOL, FOAM;RECTAL",..: 203 331 121 121 121 121 121 121 121 121 ... # $ Strength : Factor w/ 4065 levels ""," EQ 5MG BASE/ML",..: 525 2491 1453 2240 2447 538 654 670 538 2447 ... # $ ReferenceDrug : int 0 0 0 0 0 0 0 0 0 0 ... # $ DrugName : Factor w/ 7161 levels "8-HOUR BAYER",..: 4773 6039 3547 3547 3547 3547 3547 3546 2796 2796 ... # $ ActiveIngredient : Factor w/ 2735 levels "ABACAVIR SULFATE",..: 1372 2446 1305 1305 1305 1305 1305 1305 1305 1305 ... # $ ReferenceStandard: int 0 0 0 0 0 0 0 0 0 0 ...，调整参数的默认值：

read.table

为进行比较：

df <- read.table(unz(temp, "Products.txt"), sep="\t", quote = "\"", fill = TRUE,
                 comment.char = "", header= TRUE)

R：在制表符分隔的文件中读取问题

1 个答案: