R:从不规则向量创建数据帧

时间:2017-10-04 15:32:26

标签: r

我正在寻找从网页上抓取的不规则信息向量,并将其转换为数据帧。

例如,给定一个如下所示的矢量:

vec <- c("Bank of America", "6%", "JP Morgan", "5%", "Bank of China", "UBS", "7%")

我想创建一个如下所示的数据框:

df <- tibble(bank.name = c("Bank of America", "JP Morgan","Bank of China","UBS"), interest.rate = c(6%, 5%, NA, 7%))

使用正则表达式可以轻松创建银行名称列。但是,我正在努力创建一个利率的载体,使NA处于正确的位置。

阅读@guscht的答案后编辑:

@guscht对这个问题有一个很好的矢量化答案!我担心我们必须使用for循环...

另外,我将@guscht的解决方案翻译成了tidyverse的语法,它看起来像这样:

test <- c("Bank of America", "6%", "JP Morgan", "5%", "Bank of China", "UBS", "7%")
df <- tibble(bank = test, rate = lead(test,1))

df %>%
    filter(str_detect(bank, "%")== FALSE) %>% #Includes only rows that are banks
    mutate(rate = ifelse(str_detect(rate, "%") == TRUE, rate, NA)) # converts non-rate values to NA

1 个答案:

答案 0 :(得分:1)

试试这个?

library(data.table) # using data.table because the syntax is nicer
test <- c("Bank of America", "6%", "JP Morgan", "5%", "Bank of China", "UBS", "7%")
dt <- data.table(bank.name = test, interest.rate = shift(test, n = 1, type = "lead"))
dt <- dt[! grepl("%", bank.name)]
dt[! grepl("%", interest.rate), interest.rate := NA]
dt
#           bank.name interest.rate
# 1:  Bank of America            6%
# 2:        JP Morgan            5%
# 3:    Bank of China            NA
# 4:              UBS            7%