我在.csv中读到NA观察记录为“ - ”。
returns <-read.csv("~/R/ADFG_Obtained_AKReturns.csv", stringsAsFactors = FALSE)
str(returns)
'data.frame': 4222 obs. of 15 variables:
$ Year : int 1975 1976 1977 1977 1977 1977 1977 1978 1978 1978 ...
$ Region : chr "Kodiak & AK Peninsula" "Kodiak & AK Peninsula" "Kodiak & AK Peninsula" "Prince William Sound" ...
$ Hatchery : chr "Kitoi Bay" "Kitoi Bay" "Kitoi Bay" "AF Koernig"
...
$ Project : chr "Kitoi Bay" "Kitoi Bay" "Kitoi Bay" "Port San
Juan" ...
$ Species : chr "Pink" "Pink" "Pink" "Pink" ...
$ Seine : chr " - " " - " " - " " 2,000 " ...
$ Gillnet : chr " - " " - " " - " " - " ...
$ Troll : chr " - " " - " " - " " - " ...
$ Other.Commercial: chr " - " " - " " - " " - " ...
$ Sport : chr " - " " - " " - " " - " ...
$ PersUse : chr " - " " - " " - " " - " ...
$ Subsis : chr " - " " - " " - " " - " ...
$ Brood : chr " 5,800 " " 8,000 " " - " " 21,300 " ...
$ CR.catch : chr " - " " - " " - " " 15,545 " ...
$ Other : chr " - " " - " " 18,500 " " - " ...
当我读入.csv然后尝试修剪时,不会删除空格(df根本不会改变。这是我的同事重新格式化过程的建议首先尝试的)。请注意,在下面的代码中我使用了“both”,但我尝试使用相同的结果“正确”。
returns <-read.csv("~/R/ADFG_Obtained_AKReturns.csv", stringsAsFactors = FALSE)
returns <- trimws(returns, which = c("both"))
str(returns)
'data.frame': 4222 obs. of 15 variables:
$ Year : int 1975 1976 1977 1977 1977 1977 1977 1978 1978
1978 ...
$ Region : chr "Kodiak & AK Peninsula" "Kodiak & AK
Peninsula" "Kodiak & AK Peninsula" "Prince William Sound" ...
$ Hatchery : chr "Kitoi Bay" "Kitoi Bay" "Kitoi Bay" "AF
Koernig" ...
$ Project : chr "Kitoi Bay" "Kitoi Bay" "Kitoi Bay" "Port San
Juan" ...
$ Species : chr "Pink" "Pink" "Pink" "Pink" ...
$ Seine : chr " - " " - " " - " " 2,000 " ...
$ Gillnet : chr " - " " - " " - " " - " ...
$ Troll : chr " - " " - " " - " " - " ...
$ Other.Commercial: chr " - " " - " " - " " - " ...
$ Sport : chr " - " " - " " - " " - " ...
$ PersUse : chr " - " " - " " - " " - " ...
$ Subsis : chr " - " " - " " - " " - " ...
$ Brood : chr " 5,800 " " 8,000 " " - " " 21,300 " ...
$ CR.catch : chr " - " " - " " - " " 15,545 " ...
$ Other : chr " - " " - " " 18,500 " " - " ...
如果我在read.csv调用中使用strip.white,则删除空格。我假设这与read.csv的工作方式有关,但是通过相应的R documentation我仍然不明白为什么会发生这种情况。
returns <-read.csv("~/R/ADFG_Obtained_AKReturns.csv", stringsAsFactors = FALSE, strip.white = TRUE)
str(returns)
'data.frame': 4222 obs. of 15 variables:
$ Year : int 1975 1976 1977 1977 1977 1977 1977 1978 1978
1978 ...
$ Region : chr "Kodiak & AK Peninsula" "Kodiak & AK
Peninsula" "Kodiak & AK Peninsula" "Prince William Sound" ...
$ Hatchery : chr "Kitoi Bay" "Kitoi Bay" "Kitoi Bay" "AF
Koernig" ...
$ Project : chr "Kitoi Bay" "Kitoi Bay" "Kitoi Bay" "Port San
Juan" ...
$ Species : chr "Pink" "Pink" "Pink" "Pink" ...
$ Seine : chr "-" "-" "-" " 2,000 " ...
$ Gillnet : chr "-" "-" "-" "-" ...
$ Troll : chr "-" "-" "-" "-" ...
$ Other.Commercial: chr "-" "-" "-" "-" ...
$ Sport : chr "-" "-" "-" "-" ...
$ PersUse : chr "-" "-" "-" "-" ...
$ Subsis : chr "-" "-" "-" "-" ...
$ Brood : chr " 5,800 " " 8,000 " "-" " 21,300 " ...
$ CR.catch : chr "-" "-" "-" " 15,545 " ...
$ Other : chr "-" "-" " 18,500 " "-" ...
因此添加strip.white调用正是我想要的空白(“ - ”)单元格,我只是不明白为什么;为什么在这种情况下修剪不起作用?
此外,观察结果中是否还有空格?
$ Seine : chr "-" "-" "-" " 2,000 " ...
对于我的下一步,我将使用以下代码将字符类型从字符更改为数字,但是我将非常感谢有关如何为所有列执行此操作的任何建议:Seine:Other(我的df中的cols 6:15)。
returns$Seine <- as.numeric(gsub(",","",returns$Seine))
$ Seine : num NA NA NA 2000 NA NA NA NA NA NA ...
(仅供参考,这是我提出的第一个问题。)