如何重新编码包含特定文本的文本

时间:2017-02-28 19:22:03

标签: r string text recode

我正在尝试将大量文本数据重新编码为文本或数值。

我的数据集包括咖啡店的名称。我想将这些咖啡店重新编码为" corporation"或者"小型企业"。问题是这些咖啡店的拼写方式有所不同(例如,星巴克与星巴克,星巴克咖啡)。我想创建一个代码来扫描数据集中的单词" star"并将其重新编码为" corporation"。

示例数据:

customers <- data.table(customer_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), 
                        store = c("starbcks", "peets", "coffee bean", "drnk", "starbucks", "coffee ben", "coffee bean", "coffee bean", "drnk", "starbucks coffee"))

我想重新编码&#34;商店&#34;列进入&#34;键入&#34;,然后我会将其计算并重新编码为数值。

customers <- data.table(customer_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), 
                        store = c("starbcks coffee", "portfolios", "coffee bean", "sharkhead", "starbucks", "coffee ben", "cuppa cuppa", "coffee bean", "drnk", "starbucks coffee"),
                        type = c("corporation", "small business", "corporation", "small business", "corporation", "corporation", "small business", "corporation", "corporation", "corporation"),
                        rc_type = c(1, 2, 1, 2, 1, 1, 2, 1, 1, 1)) 

我已经查看了stringr包并尝试了标准的重新编码方式,但无济于事。任何帮助都很感激。谢谢!

0 个答案:

没有答案