我有关于财务变量的年度数据x
,如下所示:
x1 <- data.frame(individual = letters,
"2001" = rnorm(26, 25, 5),
"2002" = rnorm(26, 30, 6),
# ... ...
"2010" = rnorm(26, 35, 5))
head(x1)
individual 2001 2002 2010
1 a 22.88818 31.11008 32.45270
2 b 29.75727 29.01248 29.43246
3 c 26.50852 36.94197 38.27126
4 d 26.70166 20.58665 27.34747
5 e 29.63059 32.59156 34.56336
6 f 23.71214 17.40315 34.72396
使用x
将reshape2::melt
转换为长格式并合并变量之后我最终得到了一个面板数据集,如:
mydata <- data.frame(individual = rep(letters[1:5], each = 5),
year = rep(2001:2005, 5),
x1 = rnorm(25, 10, 2),
x2 = rnorm(25, 30, 5),
x3 = rnorm(25, 50, 10))
head(mydata)
individual year x1 x2 x3
1 a 2001 5.980164 22.13975 45.08367
2 a 2002 11.644311 34.67157 54.06608
3 a 2003 11.805382 34.76187 63.64758
4 a 2004 10.854982 28.44147 39.11835
5 a 2005 10.586608 25.91022 39.29007
6 b 2001 8.844076 18.37490 64.73601
我现在以初始x4
的格式提供x1
的数据,并希望将x4
添加到mydata
数据集。我怎样才能在R
?
答案 0 :(得分:0)
以下是使用rvest
,purrr
和dplyr
的解决方案:
library(xml2)
library(rvest)
library(purrr)
library(stringr)
library(dplyr)
URL <- "http://archive.thedailystar.net/2003/06/01/"
page <- read_html(URL)
# This is a CSS selector which pulls out all of the relevant `<td>` tags on the page
links <- html_nodes(page, "table table table table table:not([width]) tr td:last-of-type")
# Now retrieve all of the text within each td
link_all_text <- map_chr(links, html_text)
# Pull out those matching accident
links_accident <- links[str_detect(link_all_text, "accident")]
# Create a data frame with the links and both bits of text
links_accident_detail <- map_df(links_accident, function(link) {
data_frame(href = link %>% html_node("a") %>% html_attr("href"),
link_text = link %>% html_node("a") %>% html_text,
next_line = link %>% html_node(".gistinhead") %>% html_text
)
})
links_accident_detail %>% as.data.frame()
# href link_text
#1 d30601100757.htm 2 killed in city road mishaps
# next_line
#1 Two unidentified men died in separate road accidents at Uttara and Tejgaon yesterday.