Question

我正试图从文本文件中读取数据，该文本文件包含以下格式的数据：

#content {
 padding: 170px 0 60px 0;
}

我正在使用read.table函数，如下所示：

583550348352212992|Thu Apr 02 08:43:39 +0000 2015|Ambulance progress 'not fast enough' http://bbc.in/1P1AJyX
583406140337164288|Wed Apr 01 23:10:37 +0000 2015|Children’s hospital builds sleep app http://bbc.in/1BO9jlZ

当我阅读文件时，看到以下内容：

bbchealth <- read.table(file=".../bbchealth.txt", 
                    sep="|", 
                    header = F, 
                    quote="", 
                    fill=F, 
                    stringsAsFactors = F,
                    numerals ="no.loss",
                    col.names = c("TweetId", "Date and Time", "Tweet"))

如您所见，“儿童583550348352212992 Thu Apr 02 08:43:39 +0000 2015 Ambulance progress 'not fast enough' http://bbc.in/1P1AJyX 583406140337164288 Wed Apr 01 23:10:37 +0000 2015 Childrenâ€™s hospital builds sleep app http://bbc.in/1BO”中的撇号已更改为’。

在出现撇号的情况下（甚至是倒立形式）都是这种情况。

â€™

读为

574407194961039360|Sun Mar 08 03:12:01 +0000 2015|Frankie the dog ‘sniffs out cancer’ http://bbc.in/1COjVHM

在这里，574407194961039360 Sun Mar 08 03:12:01 +0000 2015 Frankie the dog â€˜sniffs out cancerâ€™ http://bbc.in/1COjVHM转换为‘，â€˜转换为’。

如何确保按原样阅读这些符号。

Answer 1

尝试encoding="UTF-8"中的read.table()参数。

从“ |”读取数据时，获取不需要的撇号字符（管道）R中的分隔文本文件

1 个答案: