Question

我有关于财务变量的年度数据x，如下所示：

x1 <- data.frame(individual = letters,
                  "2001" = rnorm(26, 25, 5),
                  "2002" = rnorm(26, 30, 6),
                  # ... ...
                  "2010" = rnorm(26, 35, 5))
head(x1)
  individual    2001    2002    2010
1          a 22.88818 31.11008 32.45270
2          b 29.75727 29.01248 29.43246
3          c 26.50852 36.94197 38.27126
4          d 26.70166 20.58665 27.34747
5          e 29.63059 32.59156 34.56336
6          f 23.71214 17.40315 34.72396

使用x将reshape2::melt转换为长格式并合并变量之后我最终得到了一个面板数据集，如：

mydata <- data.frame(individual = rep(letters[1:5], each = 5),
                      year = rep(2001:2005, 5),
                      x1 = rnorm(25, 10, 2),
                      x2 = rnorm(25, 30, 5),
                      x3 = rnorm(25, 50, 10))
head(mydata)
  individual year        x1       x2       x3
1          a 2001  5.980164 22.13975 45.08367
2          a 2002 11.644311 34.67157 54.06608
3          a 2003 11.805382 34.76187 63.64758
4          a 2004 10.854982 28.44147 39.11835
5          a 2005 10.586608 25.91022 39.29007
6          b 2001  8.844076 18.37490 64.73601

我现在以初始x4的格式提供x1的数据，并希望将x4添加到mydata数据集。我怎样才能在R？

中执行此操作

Answer 1

以下是使用rvest，purrr和dplyr的解决方案：

library(xml2)
library(rvest)
library(purrr)
library(stringr)
library(dplyr)
URL <- "http://archive.thedailystar.net/2003/06/01/"
page <- read_html(URL)
# This is a CSS selector which pulls out all of the relevant `<td>` tags on the page
links <- html_nodes(page, "table table table table table:not([width]) tr td:last-of-type")
# Now retrieve all of the text within each td
link_all_text <- map_chr(links, html_text)
# Pull out those matching accident
links_accident <- links[str_detect(link_all_text, "accident")]
# Create a data frame with the links and both bits of text
links_accident_detail <- map_df(links_accident, function(link) {
  data_frame(href = link %>% html_node("a") %>% html_attr("href"),
             link_text = link %>% html_node("a") %>% html_text,
             next_line = link %>% html_node(".gistinhead") %>% html_text
             )
})
links_accident_detail %>% as.data.frame()
#              href                     link_text
#1 d30601100757.htm 2 killed in city road mishaps
#                                                                              next_line
#1 Two unidentified men died in separate road accidents at Uttara and Tejgaon yesterday.

将年化变量数据添加到R中的现有面板

1 个答案: