我对解析数据还比较陌生。
我有一个数据集,其中包含以下示例文本数据,它们都严格遵循此格式:
"Blessed to receive an offer from Texas State University."
"Blessed to receive an offer from Columbia University."
从“来源”后面提取学校名称的好方法是什么?
我知道纵梁和花纹,但似乎找不到合适的方法提取学校名称的变化。
答案 0 :(得分:0)
使用str_extract
(并假设所有uni名称后紧跟一个句点):
data <- c("Blessed to receive an offer from Texas State University.",
"Blessed to receive an offer from Columbia University.")
UniNames <- str_extract(data, "(?<=from\\s).*(?=\\.)")
结果:
UniNames
[1] "Texas State University" "Columbia University"