R - 基于数据帧列内的部分匹配进行多次搜索和替换

时间:2016-11-04 11:12:08

标签: r

我有一个发布商列表,如下所示:

+--------------+
|  Site Name   |
+--------------+
| Radium One   |
| Euronews     |
| EUROSPORT    |
| WIRED        |
| RadiumOne    |
| Eurosport FR |
| Wired US     |
| Eurosport    |
| EuroNews     |
| Wired        |
+--------------+

我想创建以下结果:

+--------------+----------------+
|  Site Name   | Publisher Name |
+--------------+----------------+
| Radium One   | RadiumOne      |
| Euronews     | Euronews       |
| EUROSPORT    | Eurosport      |
| WIRED        | Wired          |
| RadiumOne    | RadiumOne      |
| Eurosport FR | Eurosport      |
| Wired US     | Wired          |
| Eurosport    | Eurosport      |
| EuroNews     | Euronews       |
| Wired        | Wired          |
+--------------+----------------+

我想了解如何复制我在Power Query中使用的代码:

搜索前4个字符

如果Text.Start([站点名称],4)=“WIRE”,则“有线”,否则

搜索最后3个字符

如果Text.End([Site Name],3)=“One”,那么“RadiumOne”否则

如果未找到匹配项,则添加“休息”

它不必区分大小写。

1 个答案:

答案 0 :(得分:0)

使用properCase包中的ifultoolsgsub,我们将第一个单词后的所有内容替换为“”,即删除它并处理Radium separtely的例外情况。如果您有许多例外情况,例如Radium案例,请更新您的帖子,以便我们可以找到更适合此黑客的解决方案:)

library("ifultools")

siteName=c("Radium One","Euronews","EUROSPORT","WIRED","RadiumOne","Eurosport FR","Wired US","Eurosport","EuroNews","Wired")

publisherName = gsub("^Radium$","Radiumone",gsub("\\s+.*","",properCase(siteName)))

 # [1] "Radiumone" "Euronews"  "Eurosport" "Wired"     "Radiumone" "Eurosport" "Wired"    
 # [8] "Eurosport" "Euronews"  "Wired"