如何使用R解析本地HTML表

时间:2014-04-30 08:48:21

标签: html r xml-parsing

我有一个本地" html"文件(" 117E.html"),其中包含一个表(一些生物数据),我想在R中解析该表并将其所有的列和行读入R数据框,文件只在本地提供,所以我没有网址,而是想解析本地地址:

我尝试使用以下命令进行解析,使用" XML"然而,R中的包,结果"表"我们空了

tables=readHTMLTable(doc="117E.html",header=T,as.data.frame=T)

以下是html文件,所以如果你想看看我的html文件是如何复制以下部分并将其粘贴到一个单独的.html文件名中117E.html:

<!DOCTYPE html
<title>Results</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body>
</p><table border="1" /><tr><th>Seq_ID</th><th>Match_region</th>  <th>Superfamily E_value</th><th>SCOP superfamily</th><th>Family E_value</th> <th>SCOP family</th><th>Closest structure</th><th>Alignment</th></tr><tr> <th>117E:A|PDBID|CHAIN|SEQUENCE</th><th>2-282</th><th>1.15e-110</th><th><a     href="http://supfam.org/SUPERFAMILY/cgi-bin/scop.cgi?sunid=50324">Inorganic  pyrophosphatase</a></th><th>3.60e-11</th><th><a href="http://supfam.org /SUPERFAMILY/cgi-bin/scop.cgi?sunid=50325">Inorganic pyrophosphatase</a> </th><th><a href="http://supfam.org/SUPERFAMILY/cgi-bin /scop.cgi?sunid=25395">8prk A:</a></th><th> <pre>YTTRQIGAKNTLEYKVYIEKDGKPVSAFHDIPLYADKENNIFNMVVEIPR<br>WTNAKLEITKEETLNPIIQDTKKGKLRFVRNCFPHHGYIHNYGAFPQTWE<br>DPNVSHPETKAVGDNEPIDVLEIGETIAYTGQVKQVKALGIMALLDEGET<br>DWKVIAIDINDPLAPKLNDIEDVEKYFPGLLRATNEWFRIYKIPDGKPEN<br>QFAFSGEAKNKKYALDIIKETHDSWKQLIAGKSSDSKGIDLTNVTLPDTP<br>TYSKAASDAIPPASLKADAPIDKSIDKWFFI<br></pre> </th></tr><tr><th>117E:B|PDBID|CHAIN|SEQUENCE</th><th>2-282</th><th>1.15e-110</th><th><a href="http://supfam.org/SUPERFAMILY/cgi-bin/scop.cgi?sunid=50324">Inorganic pyrophosphatase</a></th><th>3.60e-11</th><th><a href="http://supfam.org/SUPERFAMILY/cgi-bin/scop.cgi?sunid=50325">Inorganic pyrophosphatase</a></th><th><a href="http://supfam.org/SUPERFAMILY/cgi-bin/scop.cgi?sunid=25395">8prk A:</a></th><th><pre>YTTRQIGAKNTLEYKVYIEKDGKPVSAFHDIPLYADKENNIFNMVVEIPR<br>WTNAKLEITKEETLNPIIQDTKKGKLRFVRNCFPHHGYIHNYGAFPQTWE<br>DPNVSHPETKAVGDNEPIDVLEIGETIAYTGQVKQVKALGIMALLDEGET<br>DWKVIAIDINDPLAPKLNDIEDVEKYFPGLLRATNEWFRIYKIPDGKPEN<br>QFAFSGEAKNKKYALDIIKETHDSWKQLIAGKSSDSKGIDLTNVTLPDTP<br>TYSKAASDAIPPASLKADAPIDKSIDKWFFI<br></pre></th></tr></table>
</body>
</html>

0 个答案:

没有答案