Question

我有一堆文件文件，文件名包含非ASCII字符。例如，这是一个标题：

readLines('bbb/ović, Melika_ Omeragić, Ismir_ Bata.txt')

## Error in file(con, "r") : cannot open the connection
## In addition: Warning message:
## In file(con, "r") :
##   cannot open file 'bbb/ovi?, Melika_ Omeragi?, Ismir_ Bata.txt': Invalid argument

我试试：

dir('bbb')
## [1] "ovic, Melika_ Omeragic, Ismir_ Bata.txt"

所以我试过了：

readLines(list.files('bbb', full.names = TRUE))

## Error in file(con, "r") : cannot open the connection
## In addition: Warning message:
## In file(con, "r") :
##   cannot open file 'bbb/ovic, Melika_ Omeragic, Ismir_ Bata.txt': No such file or directory

我如何以编程方式读取这些文件？这些文件的内容与这个问题无关，只是文件名中的特殊字符和读取文件。

如果有必要，如果有更改文件名的方法，以便在I＆＃39; m中也可以阅读它们。

我意识到我没有MWE，但无法为此问题创建一个。只需生成一个文本文件并命名它：ović, Melika_ Omeragić, Ismir_ Bata.txt并使用我上面的代码来阅读它将说明问题。

Answer 1

我能够使用readr的{{1}}读取名为ović，Melika_Omeragić，Ismir_ Bata.txt的文件。字节序列甚至似乎与内部文本匹配，这是一件好事。

read_lines_raw

希望这有帮助。

Answer 2

Windows中的东西非常棘手，但我能够使用这些帖子找到解决方法：

equivalent of (dir/b > files.txt) in PowerShell

R: can't read unicode text files even when specifying the encoding

我用来读取文件的想法是将其名称写入文件中，并使用适当的编码从那里读取。

我的解决方案如下（出于再现性原因，我使用here库）：

libarary(here)

obtain.files <- function(folder){
  # Obtain all files in folder and write output into file
  system(paste0("cmd /K ",'cd /d "',folder,'/" &  cmd /u /c "dir /b > filestmp.txt"'))
  tmpfilepath <- paste0(folder,"/filestmp.txt")
  # Read temporal file 
  # Not sure it will work in all windows versions
  RL<-readLines(con <- file(tmpfilepath,encoding="UCS-2LE"))

  # Remove file
  file.remove(tmpfilepath)
  # Keep only valid files
  RL <- RL[RL!="filestmp.txt"]
  return(RL)
}

folder <- here::here("bbb")
# There is only one file in the folder
files <- obtain.files(folder)

readLines(here::here("bbb",files))

我使用了第一篇文章中找到的cmd命令，输出位于UCS-2LE。它可能不是平台独立的。 powershell filetmp.txt位于UTF-16，可能是一个更为一般的例子。

在Windows上读取非ascii文件路径

2 个答案: