将文件读入R保持行尾

时间:2013-08-01 04:19:41

标签: r

可能是一个简单的问题,我已经查看了scan中的许多选项,但还没有得到我想要的内容。

一个简单的例子是

require(httr)
example <- content(GET("http://www.r-project.org"), as = 'text')
write(example, 'text.txt')
input <- readLines('text.txt')

> example
[1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n<html>\n<head>\n<title>The R Project for Statistical Computing</title>\n<link rel=\"icon\" href=\"favicon.ico\" type=\"image/x-icon\">\n<link rel=\"shortcut icon\" href=\"favicon.ico\" type=\"image/x-icon\">\n<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">\n</head>\n\n<FRAMESET cols=\"1*, 4*\" border=0>\n<FRAMESET rows=\"120, 1*\">\n<FRAME src=\"logo.html\" name=\"logo\" frameborder=0>\n<FRAME src=\"navbar.html\" name=\"contents\" frameborder=0>\n</FRAMESET>\n<FRAME src=\"main.shtml\" name=\"banner\" frameborder=0>\n<noframes>\n<h1>The R Project for Statistical Computing</h1>\n\nYour browser seems not to support frames,\nhere is the <A href=\"navbar.html\">contents page</A> of the R Project's\nwebsite.\n</noframes>\n</FRAMESET>\n\n\n\n"

input
 [1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"       
 [2] "<html>"                                                                  
 [3] "<head>"                                                                  
 [4] "<title>The R Project for Statistical Computing</title>"                  
 [5] "<link rel=\"icon\" href=\"favicon.ico\" type=\"image/x-icon\">"          
 [6] "<link rel=\"shortcut icon\" href=\"favicon.ico\" type=\"image/x-icon\">" 
 [7] "<link rel=\"stylesheet\" type=\"text/css\" href=\"R.css\">"              
 [8] "</head>"                                                                 
 [9] ""                                                                        
[10] "<FRAMESET cols=\"1*, 4*\" border=0>"                                     
[11] "<FRAMESET rows=\"120, 1*\">"                                             
[12] "<FRAME src=\"logo.html\" name=\"logo\" frameborder=0>"                   
[13] "<FRAME src=\"navbar.html\" name=\"contents\" frameborder=0>"             
[14] "</FRAMESET>"                                                             
[15] "<FRAME src=\"main.shtml\" name=\"banner\" frameborder=0>"                
[16] "<noframes>"                                                              
[17] "<h1>The R Project for Statistical Computing</h1>"                        
[18] ""                                                                        
[19] "Your browser seems not to support frames,"                               
[20] "here is the <A href=\"navbar.html\">contents page</A> of the R Project's"
[21] "website."                                                                
[22] "</noframes>"                                                             
[23] "</FRAMESET>"                                                             
[24] ""                                                                        
[25] ""                                                                        
[26] ""                                                                        
[27] ""     

这样做的动机是我想在Postgresql中存储各种文件,并且我以example给出的格式传递它们而不是input。如果我没有很好地解释,请道歉。

@Hong Ooi使用readChar给出了一个很好的答案。我有编码问题所以必须包装

iconv(readChar(file, nchars=file.info(file)["size"], TRUE), from = "latin1", to = "UTF-8")

停止数据库抱怨。

1 个答案:

答案 0 :(得分:4)

如果您希望将所有这些字符串连接成一个字符串:

paste(input, collapse="\n")

或者,如果您正在读取文件并希望避免将输入拆分为多个位并将它们重新组合在一起:

f <- readChar(file, nchars=file.info(file)["size"], TRUE)