为什么我的分隔符在read.table中停止工作

时间:2012-12-05 20:34:00

标签: r parsing

我有一个文本文件(myFile.txt),我试图将其转换为data.frame

这是摘录

 <li>

                    <a title="Data table: Grand Falls-Windsor (Census Agglomeration), Newfoundland and Labrador" href="../../details/page.cfm?Lang=E&amp;Geo1=CMA&amp;Code1=010&amp;Geo2=PR&amp;Code2=01&amp;Data=Count&amp;SearchText=Grand%20Falls-Windsor&amp;SearchType=Begins&amp;SearchPR=01&amp;B1=All&amp;GeoLevel=PR&amp;GeoCode=010&amp;TABID=1">Grand Falls-Windsor&nbsp;(<acronym title="Census Agglomeration">CA</acronym>)</a> [<a href="../../details/page_Map_Carte_Detail.cfm?Lang=E&amp;G=1&amp;Geo1=CMA&amp;Code1=010&amp;Geo2=PR&amp;Code2=01&amp;Data=Count&amp;SearchText=&amp;SearchType=Begins&amp;SearchPR=01&amp;B1=All&amp;Custom=&amp;TABID=1&amp;geocode=010" title="Map: Grand Falls-Windsor (Census Agglomeration), Newfoundland and Labrador">map</a>]

         </li>

<li>

                    <a title="Data table: St. John's (Census Metropolitan Area), Newfoundland and Labrador" href="../../details/page.cfm?Lang=E&amp;Geo1=CMA&amp;Code1=001&amp;Geo2=PR&amp;Code2=01&amp;Data=Count&amp;SearchText=St.%20John's&amp;SearchType=Begins&amp;SearchPR=01&amp;B1=All&amp;GeoLevel=PR&amp;GeoCode=001&amp;TABID=1">St. John's&nbsp;(<acronym title="Census Metropolitan Area">CMA</acronym>)</a> [<a href="../../details/page_Map_Carte_Detail.cfm?Lang=E&amp;G=1&amp;Geo1=CMA&amp;Code1=001&amp;Geo2=PR&amp;Code2=01&amp;Data=Count&amp;SearchText=&amp;SearchType=Begins&amp;SearchPR=01&amp;B1=All&amp;Custom=&amp;TABID=1&amp;geocode=001" title="Map: St. John's (Census Metropolitan Area), Newfoundland and Labrador">map</a>]

         </li>


df <- read.table("myFile.txt",sep="\n")

这可以直到大瀑布提取物的末尾,但是\n似乎没有生效。这是一行控制台读数,你可以看到最后有\n个没有生效。

\t\t\t\t\t\t<a title=Data table: St. John's (Census Metropolitan Area), Newfoundland and Labrador href=../../details/page.cfm?Lang=E&amp;Geo1=CMA&amp;Code1=001&amp;Geo2=PR&amp;Code2=01&amp;Data=Count&amp;SearchText=St.%20John's&amp;SearchType=Begins&amp;SearchPR=01&amp;B1=All&amp;GeoLevel=PR&amp;GeoCode=001&amp;TABID=1>St. Johns&nbsp;(<acronym title="Census Metropolitan Area">CMA</acronym>)</a> [<a href="../../details/page_Map_Carte_Detail.cfm?Lang=E&amp;G=1&amp;Geo1=CMA&amp;Code1=001&amp;Geo2=PR&amp;Code2=01&amp;Data=Count&amp;SearchText=&amp;SearchType=Begins&amp;SearchPR=01&amp;B1=All&amp;Custom=&amp;TABID=1&amp;geocode=001" title="Map: St. Johns (Census Metropolitan Area), Newfoundland and Labrador>map</a>]\n\t\t\t\t\t\n             </li>\n\t\t\t \n

为此演示文稿道歉,但在尝试从控制台复制时Rstudio崩溃时遇到了很多麻烦

有人可以帮忙吗?也许read.table()无论如何都不正确?

2 个答案:

答案 0 :(得分:1)

看起来你可能有一个未公开的公开引用"

也许这只是你在这里复制和粘贴的内容,还是数据本身?

答案 1 :(得分:1)

您可以使用以下任何一种:

read.table("test.txt", sep="\n", quote="")
readLines(con=file("test.txt"))

您的单引号问题不匹配。