我从PDF中提取了多个表,其中包含多行字符串。我已经使用了Tabulizer包中的extract_table()函数,唯一的问题是字符串导入为单独的行。
例如
action <- c(1, NA, NA, 2, NA, 3, NA, NA, NA, 4, NA)
description <- c("a", "b", "c", "a", "b", "a", "b", "c", "d", "a", "b")
data.frame(action, description)
action description
1 1 a
2 NA b
3 NA c
4 2 a
5 NA b
6 3 a
7 NA b
8 NA c
9 NA d
10 4 a
11 NA b
我想将字符串连接起来,以便它们显示为相同的元素,例如:
action description
1 1 a b c
2 2 a b
3 3 a b c d
4 4 a b
希望如此,感谢您的帮助!
答案 0 :(得分:3)
tidyverse
的方式是使用先前的非NA值fill
action
列,然后是group_by
Action
和paste
{ 1}}。
description
答案 1 :(得分:1)
一个base R
选项
dat <- data.frame(action, description)
aggregate(
description ~ action,
transform(dat, action = cumsum(!is.na(dat$action))),
FUN = paste,
... = collapse = " "
)
# action description
#1 1 a b c
#2 2 a b
#3 3 a b c d
#4 4 a b
要使aggregate
工作,我们需要将action
更改为cumsum(!is.na(dat$action)))
返回的值,即
cumsum(!is.na(dat$action)))
#[1] 1 1 1 2 2 3 3 3 3 4 4
答案 2 :(得分:1)
这是data.table
library(data.table)
setDT(df1)[, .(description = paste(description, collapse = ' ')),
.(action = cumsum(!is.na(action)))]
# action description
#1: 1 a b c
#2: 2 a b
#3: 3 a b c d
#4: 4 a b
或使用na.locf
中的zoo
library(zoo)
setDT(df1)[, .(description = paste(description, collapse = ' ')),
.(action = na.locf(action))]
df1 <- data.frame(action, description)
答案 3 :(得分:0)
您可以像这样使用zoo
和dplyr
软件包
library(zoo)
library(dplyr)
action <- c(1, NA, NA, 2, NA, 3, NA, NA, NA, 4, NA)
description <- c("a", "b", "c", "a", "b", "a", "b", "c", "d", "a", "b")
df = data.frame(action, description)
df$action = na.locf(df$action)
df =
df %>%
group_by(action) %>%
summarise(description = paste(description, collapse = ' '))