我正在清理R中的数据,并希望在将列从数字切换为字符时保持数字格式,特别是百分之一的有效零(在下面的示例中)。我的输入列主要以Factor数据开头,下面是我尝试执行的示例。
我敢肯定,有一种更好的方法,只是希望一些比我了解更多知识的人能有所作为。在线上大多数问题都涉及前导零或格式化纯数字列,但是数据中“ <”符号的方面使我陷入了循环,了解执行此操作的正确方法。
df = as.factor(c("0.01","5.231","<0.02","0.30","0.801","2.302"))
ind = which(df %in% "<0.02") # Locate the below detection value.
df[ind] <- NA # Substitute NA temporarily
df = as.numeric(as.character(df)) # Changes to numeric column
df = round(df, digits = 2) # Rounds to hundredths place
ind1 = which(df < 0.02) # Check for below reporting limit values
df = as.character(df) # Change back to character column...
df[c(ind,ind1)] = "<0.02" # so I can place the reporting limit back
> # RESULTS::
> df
[1] "<0.02" "5.23" "<0.02" "0.3" "0.8" "2.3"
但是,数据中的第4,第5和第6个值不再报告百分之一百的零。正确的操作顺序是什么?也许将列改回字符是不正确的?任何建议将不胜感激。
谢谢。
编辑:----根据hrbrmstr和Mike的建议: 谢谢你的建议。我尝试了以下方法,它们都导致相同的问题。也许还有另一种方法可以索引/替换值?
格式,同样的问题:
#... code from above...
ind1 = which(df < 0.02)
df = as.character(df)
df[!c(ind,ind1)] = format(df[!c(ind,ind1)],digits=2,nsmall=2)
> df
[1] "<0.02" "5.23" "<0.02" "0.3 " "0.8 " "2.3 "
sprintf,同样的问题:
# ... above code from example ...
ind1 = which(df < 0.02) # Check for below reporting limit values.
sprintf("%.2f",df) # sprintf attempt.
[1] "0.01" "5.23" "NA" "0.30" "0.80" "2.30"
df[c(ind,ind1)] = "<0.02" # Feed the symbols back into the column.
> df
[1] "<0.02" "5.23" "<0.02" "0.3" "0.8" "2.3" #Same Problem.
尝试了另一种替换值的方法,并且存在相同的问题。
# ... above code from example ...
> ind1 = which(df < 0.02)
> df[c(ind,ind1)] = 9999999
> sprintf("%.2f",df)
[1] "9999999.00" "5.23" "9999999.00" "0.30" "0.80" "2.30"
> gsub("9999999.00","<0.02",df)
[1] "<0.02" "5.23" "<0.02" "0.3" "0.8" "2.3" #Same Problem.
答案 0 :(得分:1)
您可以使用gsub
和一些正则表达式来填充它。
df <- c("<0.02", "5.23", "<0.02", "0.3", "4", "0.8", "2.3")
gsub("^([^\\.]+)$", "\\1\\.00", gsub("\\.(\\d)$", "\\.\\10", df))
[1] "<0.02" "5.23" "<0.02" "0.30" "4.00" "0.80" "2.30"
第一个gsub
查找一个点,后跟一个数字和一个字符串结尾,并用其自身替换一个数字(捕获组\\1
),其后跟一个零。第二个检查没有点的数字,并在末尾添加.00
。