readLines函数丢失内容显示一小部分

时间:2012-09-06 03:20:46

标签: r

为什么我无法在readLines中阅读下载的文件?我怎么读呢?

url="http://www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
txt=download.file(url,destfile="stock")
> file1=readLines("stock",encoding="big5")
Warning messages:
1: In readLines("stock", encoding = "big5") :
invalid input found on input connection 'stock'
2: In readLines("stock", encoding = "big5") :
incomplete final line found on 'stock'
> file1=readLines("stock",encoding="gbk")
Warning messages:
1: In readLines("stock", encoding = "gbk") :
invalid input found on input connection 'stock'
2: In readLines("stock", encoding = "gbk") :
incomplete final line found on 'stock'
> file1=readLines("stock",encoding="gb2132")
Warning messages:
1: In readLines("stock", encoding = "gb2132") :
invalid input found on input connection 'stock'
2: In readLines("stock", encoding = "gb2132") :
incomplete final line found on 'stock'
> file1=readLines("stock",encoding="gb18030")
Warning messages:
1: In readLines("stock", encoding = "gb18030") :
 invalid input found on input connection 'stock'
2: In readLines("stock", encoding = "gb18030") :
incomplete final line found on 'stock'

该文件只包含部分内容,许多内容丢失,为什么?

1 个答案:

答案 0 :(得分:0)

该文件包含18行,我的R读取所有这18行。我怀疑你是在试图忽略文本文件和HTML文件之间的区别。要提取HTML表格,您需要使用this

之类的内容