我正在使用存储在HTML中的大量表的项目中工作。在抓取过程中,我不得不处理以下问题。
Some of the tables that I am scraping look like this
我必须在此代码中输入一个read_html(link) %>%
html_nodes(node) %>%
html_table(fill = T, header = T, dec = ",")
参数,用于那些合并单元格的行(“鸡”和“没有骨头的鸡”),在我导入DF时:
df <- data.frame(year = c("chicken",2000,2001,2002,"chicken without bones",2003,2004,2005, "chicken without bones and feet", 2006, 2007, 2008),
weight = c("chicken",5,6,4,"chicken without bones",2,1,3,"chicken without bones and feet", 1, 1.5, 2)
)
但这为我生成的表格如下:
df2 <- data.frame(year = c(2000,2001,2002, 2003, 2004, 2005,2006,2007, 2008), number = c(5,6,4,2,1,3,1,1.5, 2),
new_variable = c("chicken","chicken","chicken","chicken without bones","chicken without bones",
"chicken without bones","chicken without bones and feet","chicken without bones and feet","chicken without bones and feet" )
)
试图找到一种方法让我的表看起来像这样:
-
我正在努力与R挣扎,但仍然不知道如何使用我的1.028.974表格进行刮擦。 Obs。:表格没有这种情况发生的模式;因为我需要一个标识填充节点的代码,将它们的值作为字符并将其转换为新的列值,直到下一次填充发生。
感谢您的关注!!
答案 0 :(得分:0)
你可以试试这个 -
library(dplyr)
library(zoo)
df %>%
mutate_if(is.factor, as.character) %>%
mutate(new_variable = ifelse(grepl("\\D+", year), year, NA),
new_variable = na.locf(new_variable)) %>%
filter(!grepl("\\D+", year))
输出为:
year weight new_variable
1 2000 5 chicken
2 2001 6 chicken
3 2002 4 chicken
4 2003 2 chicken without bones
5 2004 1 chicken without bones
6 2005 3 chicken without bones
7 2006 1 chicken without bones and feet
8 2007 1.5 chicken without bones and feet
9 2008 2 chicken without bones and feet
示例数据:
df <- structure(list(year = structure(c(10L, 1L, 2L, 3L, 11L, 4L, 5L,
6L, 12L, 7L, 8L, 9L), .Label = c("2000", "2001", "2002", "2003",
"2004", "2005", "2006", "2007", "2008", "chicken", "chicken without bones",
"chicken without bones and feet"), class = "factor"), weight = structure(c(8L,
6L, 7L, 5L, 9L, 3L, 1L, 4L, 10L, 1L, 2L, 3L), .Label = c("1",
"1.5", "2", "3", "4", "5", "6", "chicken", "chicken without bones",
"chicken without bones and feet"), class = "factor")), class = "data.frame", row.names = c(NA,
-12L))
# year weight
#1 chicken chicken
#2 2000 5
#3 2001 6
#4 2002 4
#5 chicken without bones chicken without bones
#6 2003 2
#7 2004 1
#8 2005 3
#9 chicken without bones and feet chicken without bones and feet
#10 2006 1
#11 2007 1.5
#12 2008 2