Question

我想将向量转换为数据框。向量由唯一的ID组成，其后是其他字段。这些字段是详尽无遗的，大约有30个不同的字段，都标有反斜杠。

\ID a 
\description text yes 
\definition text yes 
\other.info text yes 
\ID b 
\definition text yes 
\other.info text yes 
\ID d 
\description text yes 
\other.info text yes 
\translation text yes

我需要将其转换为：

ID  description  definition  other.info  translation
 a   text yes     text yes    text yes
 b                text yes    text yes
 d   text yes                 text yes    text yes

谢谢您的帮助

Answer 1

这里有些肮脏但可以完成工作：

library(stringr) # Will use str_extract() with some regex
library(magrittr) # pipes: %>%
library(data.table) # rbindlist (I think dplyr has bind_rows() which is similar)

split(vect, cumsum(grepl("ID", vect))) %>% 
  lapply(function(x) setNames(data.frame(t(str_extract(x, "\\w+$"))), str_extract(x, "^.+\\s")) ) %>% 
  rbindlist(fill = TRUE) %>% 
  setNames(gsub("text|\\\\", "", names(.)))


   ID  description   definition   other.info   translation  
1:   a           yes          yes          yes          <NA>
2:   b          <NA>          yes          yes          <NA>
3:   d           yes         <NA>          yes           yes

数据：

vect <- c("\\ID a", "\\description text yes", "\\definition text yes", "\\other.info text yes", 
"\\ID b", "\\definition text yes", "\\other.info text yes", "\\ID d", 
"\\description text yes", "\\other.info text yes", "\\translation text yes"
)

将向量聚合到数据框中

1 个答案: