Question

关于这个数据提取的问题我做了。我想创建一个包含数据的条形图，但不幸的是我无法将提取的字符转换为R中的数字。如果我在文本编辑器中编辑文件，根本没有问题，但我想做整个R中的过程。这是代码：

using System;
using OpenQA.Selenium;
using OpenQA.Selenium.Remote;

namespace SeleniumTest
{
class Program
{
    static void Main(string[] args)
    {
        IWebDriver driver;
        DesiredCapabilities capability = DesiredCapabilities.Chrome();
        capability.SetCapability("browserName", "iPad");
        capability.SetCapability("platform", "MAC");
        capability.SetCapability("device", "undefined");
        capability.SetCapability("browserstack.user", "");
        capability.SetCapability("browserstack.key", "");

        driver = new RemoteWebDriver(
          new Uri("http://hub-cloud.browserstack.com/wd/hub/"), capability
        );
        driver.Navigate().GoToUrl("http://www.google.com");
        Console.WriteLine(driver.Title);

        IWebElement query = driver.FindElement(By.Name("q"));
        query.SendKeys("Browserstack");
        query.Submit();
        Console.WriteLine(driver.Title);

        driver.Quit();
    }
}
}

结果在install.packages("rvest") library(rvest) url <- "https://en.wikipedia.org/wiki/Corporate_tax" corporatetax <- url %>% read_html() %>% html_nodes(xpath='//*[@id="mw-content-text"]/div/table[5]') %>% html_table() str(corporatetax)中有一个data.frame，其中包含3个变量，所有这些变量都是字符。我的问题，我还没有解决，我应该如何继续将第二列和第三列转换为数字来创建条形图？我尝试过使用sapply（）和dplyr（），但没有找到正确的方法。

谢谢！

Answer 1

您可能会尝试像这样清理表格

library(rvest)
library(stringr)
library(dplyr)

url <- "https://en.wikipedia.org/wiki/Corporate_tax"

corporatetax <- url %>% 
  read_html() %>% 
  # your xpath defines the single table, so you can use html_node() instead of html_nodes()
  html_node(xpath='//*[@id="mw-content-text"]/div/table[5]') %>% 
  html_table() %>% as_tibble() %>% 
  setNames(c("country", "corporate_tax", "combined_tax"))

corporatetax %>% 
  mutate(corporate_tax=as.numeric(str_replace(corporate_tax, "%", ""))/100,
         combined_tax=as.numeric(str_replace(combined_tax, "%", ""))/100
         )

R：将字符转换为R data.frame中的数字

1 个答案: