使用htmlParse清除数据框中的文本

时间:2017-03-04 00:54:19

标签: html r html-parsing rstudio

我有一个dataFrame,其中一列包含HTML文本,另一列包含二进制变量。在第一列中看起来像这样:

1 <p>I am trying to build a simple map app with Shiny and ggplot2. It works as follow: </p>\n\n<ul>\n<li>user selects a country </li>\n<li>the app loads a shape file and gives a list of input fields for adm1 country regions</li>\n<li>user inputs a numeric value for each region (fields are initially filled with random values) </li>\n<li>all values from input fields are collected in a vector, merged to the map data and given as a <code>fill</code> argument to the <code>ggplot()</code> function</li>\n</ul>\n\n<p>The problem is that ggplot doesn't seem to interpret correctly the input values for each regions. Plus, colors on the map don't change when input values are modified through the UI. I believe the  <code>indicator</code> vector fed to the <code>fill</code> argument is not correctly interpreted/passed.</p>\n\n<p>Thank you for your suggestions.</p>\n\n<p><em>Note: in the code below, the shapefiles are sourced on the UCDavis website for reproducibility. I usually store them locally.</... <truncated>

我试图使用for循环来清理或删除HTML标记,但是R说这不是XML代码:

for (i in 1:nrow(dataFrame)) {
  row <- dataFrame[i,]
  htmlParse(dataFrame)
}

有什么建议吗?

0 个答案:

没有答案