Question

我有一个发布商列表，如下所示：

+--------------+
|  Site Name   |
+--------------+
| Radium One   |
| Euronews     |
| EUROSPORT    |
| WIRED        |
| RadiumOne    |
| Eurosport FR |
| Wired US     |
| Eurosport    |
| EuroNews     |
| Wired        |
+--------------+

我想创建以下结果：

+--------------+----------------+
|  Site Name   | Publisher Name |
+--------------+----------------+
| Radium One   | RadiumOne      |
| Euronews     | Euronews       |
| EUROSPORT    | Eurosport      |
| WIRED        | Wired          |
| RadiumOne    | RadiumOne      |
| Eurosport FR | Eurosport      |
| Wired US     | Wired          |
| Eurosport    | Eurosport      |
| EuroNews     | Euronews       |
| Wired        | Wired          |
+--------------+----------------+

我想了解如何复制我在Power Query中使用的代码：

搜索前4个字符

如果Text.Start（[站点名称]，4）=“WIRE”，则“有线”，否则

搜索最后3个字符

如果Text.End（[Site Name]，3）=“One”，那么“RadiumOne”否则

如果未找到匹配项，则添加“休息”

它不必区分大小写。

Answer 1

使用properCase包中的ifultools和gsub，我们将第一个单词后的所有内容替换为“”，即删除它并处理Radium separtely的例外情况。如果您有许多例外情况，例如Radium案例，请更新您的帖子，以便我们可以找到更适合此黑客的解决方案：）

library("ifultools")

siteName=c("Radium One","Euronews","EUROSPORT","WIRED","RadiumOne","Eurosport FR","Wired US","Eurosport","EuroNews","Wired")

publisherName = gsub("^Radium$","Radiumone",gsub("\\s+.*","",properCase(siteName)))

 # [1] "Radiumone" "Euronews"  "Eurosport" "Wired"     "Radiumone" "Eurosport" "Wired"    
 # [8] "Eurosport" "Euronews"  "Wired"

R - 基于数据帧列内的部分匹配进行多次搜索和替换

1 个答案: