从print()控制台输出重新创建向量

时间:2018-07-24 08:29:41

标签: r

很遗憾,您经常会在SO上看到以某种格式显示数据的问题 那是不可复制的;通常只是print()的复制结果...

set.seed(1)

x <- sample(LETTERS, 40, replace = T)
y <- rnorm(20)

...例如:

x
 [1] "G" "J" "O" "X" "F" "X" "Y" "R" "Q" "B" "F" "E" "R" "J" "U" "M" "S"
[18] "Z" "J" "U" "Y" "F" "Q" "D" "G" "K" "A" "J" "W" "I" "M" "P" "M" "E"
[35] "V" "R" "U" "C" "S" "K"

...或这样:

y
 [1]  0.91897737  0.78213630  0.07456498 -1.98935170  0.61982575
 [6] -0.05612874 -0.15579551 -1.47075238 -0.47815006  0.41794156
[11]  1.35867955 -0.10278773  0.38767161 -0.05380504 -1.37705956
[16] -0.41499456 -0.39428995 -0.05931340  1.10002537  0.76317575

理想情况下,我希望能够将上面的文本块中的文本复制到剪贴板中,并调用一些函数foo(),例如,all.equal(foo(), x)用于离散数据类型,以及{ {1}}用于浮点数(根据打印的精度)。

是否有一种简便的方法可以从all(near(foo(), y))的复制结果中(大约)重建一个简单的矢量?


编辑:具有讽刺意味的是,我意识到我自己的示例并不完全可重复。这是创建复制的打印输出的代码:

print()

3 个答案:

答案 0 :(得分:2)

我使用scan解决该问题。

您可以使用以下代码制作函数吗?

y <-
  '[1]  0.91897737  0.78213630  0.07456498 -1.98935170  0.61982575
 [6] -0.05612874 -0.15579551 -1.47075238 -0.47815006  0.41794156
[11]  1.35867955 -0.10278773  0.38767161 -0.05380504 -1.37705956
[16] -0.41499456 -0.39428995 -0.05931340  1.10002537  0.76317575'

y <- scan(what = character(), text = y)
y <- sub("^\\s*\\[\\d+\\]", "", y)
y <- as.numeric(y[y != ""])

在@Moody_Mudskipper的评论中的建议,

  

模式可以更新为“ ^ \ s * \ [\ d + \]”以支持OP的示例(以空格开头)。

一个功能可能是

recreateVector <- function(X, numeric = TRUE, quiet = FALSE){
  X <- scan(what = character(), text = X, quiet = quiet)
  X <- sub("^\\s*\\[\\d+\\]", "", X)
  X <- X[X != ""]
  if(numeric) X <- as.numeric(X)
  X
}


recreateVector(y)   # Use the original y
#Read 24 items
# [1]  0.91897737  0.78213630  0.07456498 -1.98935170  0.61982575
# [6] -0.05612874 -0.15579551 -1.47075238 -0.47815006  0.41794156
#[11]  1.35867955 -0.10278773  0.38767161 -0.05380504 -1.37705956
#[16] -0.41499456 -0.39428995 -0.05931340  1.10002537  0.76317575

使用字符向量,设置参数numeric = FALSE,默认值为TRUE

x <-
'[1] "G" "J" "O" "X" "F" "X" "Y" "R" "Q" "B" "F" "E" "R" "J" "U" "M" "S"
[18] "Z" "J" "U" "Y" "F" "Q" "D" "G" "K" "A" "J" "W" "I" "M" "P" "M" "E"
[35] "V" "R" "U" "C" "S" "K"'

recreateVector(x, numeric = FALSE)
#Read 43 items
# [1] "G" "J" "O" "X" "F" "X" "Y" "R" "Q" "B" "F" "E" "R" "J" "U"
#[16] "M" "S" "Z" "J" "U" "Y" "F" "Q" "D" "G" "K" "A" "J" "W" "I"
#[31] "M" "P" "M" "E" "V" "R" "U" "C" "S" "K"

请注意参数quiet。我将默认值设置为FALSE,就像在scan的定义中一样,因为我更喜欢看是否实际读取了任何内容。

答案 1 :(得分:2)

我们可以模仿读取CSV文件时对数据类型所做的猜测:

library(tidyverse)
unprint <- function(s) {
  s %>% str_replace_all(" *\\[\\d+\\] *","") %>% str_replace_all(" +","\n") %>% 
  textConnection %>% read.table
}
unprint(' [1]  0.91897737  0.78213630  0.07456498 -1.98935170  0.61982575
 [6] -0.05612874 -0.15579551 -1.47075238 -0.47815006  0.41794156
[11]  1.35867955 -0.10278773  0.38767161 -0.05380504 -1.37705956
[16] -0.41499456 -0.39428995 -0.05931340  1.10002537  0.76317575') %>% head

#           V1
#1  0.91897737
#2  0.78213630
#3  0.07456498
#4 -1.98935170
#5  0.61982575
#6 -0.05612874


unprint(' [1] "G" "J" "O" "X" "F" "X" "Y" "R" "Q" "B" "F" "E" "R" "J" "U" "M" "S"
[18] "Z" "J" "U" "Y" "F" "Q" "D" "G" "K" "A" "J" "W" "I" "M" "P" "M" "E"
[35] "V" "R" "U" "C" "S" "K"') %>% head

#  V1
#1  G
#2  J
#3  O
#4  X
#5  F
#6  X

用于处理字符串中括号的详细版本: 还可以给出正确的输出:向量,而不是数据帧。

unprint <- function(s) {
  t <- s %>% textConnection %>% readLines %>% 
    str_replace(" *\\[\\d+\\] *","") %>%
    paste(collapse=' ') %>% str_replace_all(" ","\n") %>% 
    textConnection %>% read.table(stringsAsFactors=FALSE) 
  t$V1 %>% str_replace_all("\n"," ")
}

x <- unprint(' [1] "x + y  [1]" "x + z  [2]"')
x
#[1] "x + y  [1]" "x + z  [2]"

答案 2 :(得分:0)

对于我的使用,我最终修改了@RuiBarradas的答案,以包括 我想要的一些功能:从剪贴板中读取,然后输入猜测值(带有 阅读器的帮助。)

rescue_vector <- function(x = readClipboard()) {
  x <- gsub("(^|\n)\\s*\\[\\d+\\]", "", x)
  x <- scan(text = x, what = character(),
            allowEscapes = TRUE, quiet = TRUE)
  readr::parse_guess(x, na = character())
}

它适用于给定的示例数据:

set.seed(1)

x <- sample(LETTERS, 40, replace = TRUE)
all.equal(x, rescue_vector(capture.output(x)))
#> [1] TRUE

y <- rnorm(20)
all.equal(y, rescue_vector(capture.output(y)))
#> [1] TRUE

并从剪贴板中读取:

writeClipboard(capture.output(y))
all.equal(y, rescue_vector())
#> [1] TRUE

还有一些奇怪的情况:

z <- c("[1] first \n second", "[2] + 1")
all.equal(z, rescue_vector(capture.output(z)))
#> [1] TRUE

但是缺少值仍然是一个问题:

na <- c("", "NA", NA)
rescue_vector(capture.output(na))
#> [1] "" NA NA

正如@Moody_Mudskipper在评论中提到的,可能会有进一步的发展 还包括对粘贴表的救援尝试。