从R中的文本中提取评级和相应的日期

时间:2019-05-28 03:59:35

标签: r regex

我想从数据框中提取每个银行的评级及其日期。另外,将各个评分记录制作到新行,并将评分和日期分成两列。

这是我的数据样本:

mydf <- data.frame("bank_name"=c("Bank A","Bank B"), "records"=c("Rating: B-\nRating Range: Jun-08-2017 to Present\n\nRating: B\nRating Range: Jan-23-2013 to Jun-08-2017","Rating: BBB-\nRating Range: Oct-02-2018 to Present\n\nRating: B\nRating Range: Apr-06-2018 to Oct-02-2018\n\nRating: A\nRating Range: Jun-08-2007 to Jan-31-2008\n\nRating: CCC\nRating Range: Jan-23-2006 to Aug-08-2007"))

这是我的期望:

mydf2 <- data.frame("bank_name"=c("Bank A","Bank A","Bank B","Bank B","Bank B","Bank B"), "ratings"=c("B-","B","BBB-","B","A","CCC"),"date"=c("Jun-08-2017","Jan-23-2013","Oct-02-2018","Apr-06-2018","Jun-08-2007","Jan-23-2006"))

> mydf2
  bank_name ratings        date
1    Bank A    B-     Jun-08-2017
2    Bank A    B      Jan-23-2013
3    Bank B    BBB-   Oct-02-2018
4    Bank B    B      Apr-06-2018
5    Bank B    A      Jun-08-2007
6    Bank B    CCC    Jan-23-2006

提前谢谢!

1 个答案:

答案 0 :(得分:2)

一种选择是将{record}列中“ Rating”,“ Rate Range”之后的字符与str_extract_all提取到listunnest {{1} }元素

list