我在R中使用rvest软件包从网页获取表。但是我得到的细节不是格式,我也想将它们保存在csv文件中。下面是我的代码块。如何以excel或csv格式查看和保存结果
url <- "https://www.moneycontrol.com/india/stockpricequote/metals-non-ferrous/hindustancopper/HC07"
url %>%
read_html() %>%
html_nodes('#mktdet_1') %>%
html_text()
答案 0 :(得分:0)
这是供您使用的通用解决方案。您可以采用多种不同的方式来解析此信息并将其存储到数据框中或将其写入文本文件。这实际上取决于您的用例。但是,第一个目标是将每个元素提取到向量中自己的元素中。您的代码是一个好的开始。我们可以在此基础上,但添加一个额外的css
选择器,这使事情变得容易得多。
library(rvest)
library(dplyr)
library(xml2)
library(stringr)
#Define list of URL's to scrape
url_vec <- list(hindustal_copper = "https://www.moneycontrol.com/india/stockpricequote/metals-non-ferrous/hindustancopper/HC07",
reliance = "https://www.moneycontrol.com/india/stockpricequote/refineries/relianceindustries/RI",
dhcf = "https://www.moneycontrol.com/india/stockpricequote/finance-housing/dewanhousingfinancecorporation/DHF")
#Define empty dataframe
result_df = data.frame(name = character(),property = character(),value = numeric())
#For each url
for(name in names(url_vec)){
table = url_vec %>%
.[[name]] %>% #Extract the URL
read_html() %>% # Read the HTML
html_nodes('#mktdet_1')%>% # Extract the table ID
html_nodes(".PA7.brdb")%>% # Extract each of the elements in the tables
html_text() %>% # Convert to text
str_replace_all("[\\\t|\\\r|\\\n]"," ") %>% #Remove tab, return carrage and new line
str_squish() # Remove White space
text = gsub("^([a-zA-z\\(\\)%/. ]+)[0-9,\\.%]+$","\\1",table) #Extract the property elements
value = gsub("^[a-zA-z\\(\\)%/. ]+([0-9,\\.%]+)$","\\1",table) #Extract the numbers
value_num = as.numeric(gsub("[%, ]","",value)) # Convert numbers in character format to numeric
tbl = data.frame(name = rep(name,length(text)),property = text,value = value_num) #Create a temp dataframe
result_df = rbind(result_df,tbl) #Row bind with the original dataframe
#Deliverables are NA because they need to be extracted from the name. Use the appropriate regex to do this
}
write.csv(result_df,file = "stock_stats.csv",row.names = F)
表的结果只是一个向量,每个元素都有自己的索引。 text
和value
仅将列标签和值分开。然后,您可以根据用途存储它。