我正在使用fread()
阅读数据文件。对于某些文件,我有以下情况:
dt1 <- fread('colA colB colC
A01 NA NA
A02 NA NA
A03 NA NA
A04 NA NA
A05 NA NA
A06 NA NA
A07 bbb NA
A08 NA ccc
A09 NA NA
A10 NA NA
A11 NA NA
A12 NA NA
A13 NA NA
A14 NA NA
A15 NA NA
A16 NA NA
A17 NA NA
A18 NA NA
')
Bumped column 2 to type character on data row 7, field contains 'bbb'.
Coercing previously read values in this column from logical, integer
or numeric back to character which may not be lossless; e.g., if '00'
and '000' occurred before they will now be just '0', and there may be
inconsistencies with treatment of ',,' and ',NA,' too (if they
occurred in this column before the bump). If this matters please rerun
and set 'colClasses' to 'character' for this column. Please note that
column type detection uses the first 5 rows, the middle 5 rows and the
last 5 rows, so hopefully this message should be very rare. If
reporting to datatable-help, please rerun and include the output from
verbose=TRUE.
dt1
# colA colB colC
# 1: A01 NA
# 2: A02 NA
# 3: A03 NA
# 4: A04 NA
# 5: A05 NA
# 6: A06 NA
# 7: A07 bbb NA
# 8: A08 NA ccc
# 9: A09 NA NA
# 10: A10 NA NA
在生成的data.table中,第一个字符出现之前的colB值是空字符串而不是NA
。我事先不知道列名或列号,所以我不能使用colClasses
参数。有没有办法解决这个问题(除了使用read.table()
而不是fread()
)?
答案 0 :(得分:4)
对我的第一个回答发表评论:
fread(DT, colClasses="character")
将所有列都读为字符。单身的标准R recyling。在这种情况下,事先不知道哪个列(通过名称或数字)都有此问题,因此可以将所有字符作为字符读取。
答案 1 :(得分:1)
您可以将号列传递给colClasses
。
请参阅?fread
底部记录的大量示例:
# colClasses
data = "A,B,C,D\n1,3,5,7\n2,4,6,8\n"
fread(data, colClasses=c(B="character",C="character",D="character")) # as read.csv
fread(data, colClasses=list(character=c("B","C","D"))) # saves typing
fread(data, colClasses=list(character=2:4)) # same using column numbers
# drop
fread(data, colClasses=c("B"="NULL","C"="NULL")) # as read.csv
fread(data, colClasses=list(NULL=c("B","C"))) #
fread(data, drop=c("B","C")) # same but less typing, easier to read
fread(data, drop=2:3) # same using column numbers
# select
# (in read.csv you need to work out which to drop)
fread(data, select=c("A","D")) # less typing, easier to read
fread(data, select=c(1,4)) # same using column numbers