Question

假设我有以下文字：

"    7.7597     4.7389     3.0058     0.0013"

我知道它的格式：

" %9.4f  %9.4f  %9.4f  %9.4f"

我想从中提取变量。我想要像sprintf / gettextf这样的功能，但确实如此：

??????(" %9.4f  %9.4f  %9.4f  %9.4f", v1, v2, v3, v4)

我该怎么做？（如果可能的话，不加载任何包）

我现在使用的不可靠的方法是：

temp <- as.numeric(unlist(strsplit("    7.7597     4.7389     3.0058     0.0013"," ")))
temp[!is.na(temp)]

Answer 1

我愿意：

scan(text="  7.7597     4.7389     3.0058     0.0013")
#Read 4 items
#[1] 7.7597 4.7389 3.0058 0.0013

它正确报告NA s：

scan(text="   7.7597  NA   4.7389     3.0058     0.0013")
#Read 5 items
#[1] 7.7597     NA 4.7389 3.0058 0.0013

它在格式错误的输入（非数字）上中断。因此，您可以使用tryCatch：

来控制它

tryCatch(scan(text=" abc  7.7597  4.7389"), error= function(e) cat("Malformed input\n")) 
#Malformed input

引擎盖

scan如何正确获得花车？该函数有一个参数what，用于设置要扫描的数据类型。默认参数是

scan(...,  what=double())

因此它很好地解析了问题中所需的浮点数。无论如何，如果您改变需求并寻找不同的数据类型，请尝试：

scan(text="  7  4  3  0 ", what=integer())
#Read 4 items
#[1] 7 4 3 0

像往常一样，您可以检查数据的一致性：

tryCatch(scan(text=" 1 2.3", what=integer()), error= function(e) cat("Non-integer value(s) passed!\n")) 
#Non-integer value(s) passed!

Answer 2

为什么不让你的方法更可靠，而不是搜索甚至可能不存在的东西。

> x <- "    7.7597     4.7389     3.0058     0.0013"

> unlist(read.table(text = x, strip.white = TRUE), use.names = FALSE)
# [1] 7.7597 4.7389 3.0058 0.0013

> as.numeric(sapply(strsplit(x, "\\s+"), "[", -1))
# [1] 7.7597 4.7389 3.0058 0.0013

> as.numeric(strsplit(x, "\\s+")[[1]])[-1]
# [1] 7.7597 4.7389 3.0058 0.0013

> library(stringr)
> as.numeric(strsplit(str_trim(x), "\\s+")[[1]])
# [1] 7.7597 4.7389 3.0058 0.0013

> as.numeric(str_extract_all(x, "[0-9][.][0-9]+")[[1]])
# [1] 7.7597 4.7389 3.0058 0.0013

使用像R中的sprintf这样的函数读取文本

2 个答案: