Question

我有一个包含多种语言的文本文件，如何读取R使用read.delim函数，

编码（ “file.tsv”）

[1]“未知”

source_data = read.delim（file，header = F，fileEncoding =“windows-1252”，sep =“\ t”，quote =“”）   source_D [360]

[1]“ð¿ð¾ð¸ñðºð½ð°ññ，ð¾ð¼ñð°ð¹ñ，ðμ”

但是在记事本中显示的source_D [360]是'поискнаэтомсайте'

Answer 1

source_data = read.delim（file，header = F，sep =＆＃34; \ t＆＃34;，quote =＆＃34;＆＃34;，stringsAsFactors = FALSE）编码（source_data）=＆＃34; UTF-8＆＃34;

我试过，如果你在Windows中运行R，上面的代码适合我。如果你在Unix中运行R，你可以使用以下代码

source_data = read.delim（file，header = F，fileEncoding =＆＃34; UTF-8＆＃34;，sep =＆＃34; \ t＆＃34;，quote =＆＃34;＆＃34; ，stringsAsFactors = FALSE）

Answer 2

tidyverse方法：

在read_delim中使用选项locale。（阅读器函数使用_而不是。，通常阅读起来更快，更聪明）此处有更多详细信息：https://r4ds.had.co.nz/data-import.html#parsing-a-vector

source_data = read_delim(file, header= F, 
                         locale = locale(encoding = "windows-1252"),
                         sep = "\t", quote = "")

如何用R中的read.delim读取非英文字符？

[1]“未知”

[1]“ð¿ð¾ð¸ñðºð½ð°ññ，ð¾ð¼ñð°ð¹ñ，ðμ”

2 个答案: