从R中的文本字符串创建新类别

时间:2017-04-21 16:02:12

标签: r categories stringr

我有一个包含以下行的列:

Cursor.close()

我想根据找到的值创建一个新列:ie

Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access
Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access
Telephone line rental|Fixed broadband|Mobile phone|Paid for TV service|NET: No home phone calls|NET: Internet access
Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access
Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access
Telephone line rental|Fixed broadband|Mobile phone|Paid for TV service|NET: No home phone calls|NET: Internet access
Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access
Telephone line rental|Fixed broadband|Mobile phone|Paid for TV service|NET: No home phone calls|NET: Internet access
Fixed broadband|Mobile phone|Paid for TV service|NET: No home phone calls|NET: No home phone calls or line rental|NET: Internet access

因此,如果我们采用第一行示例:

if Telephone line rental is found, then in the new column, I want to code as V
if Fixed broadband, then code as B
if Mobile phone = M
if Paid for TV service / TV and/or sport code as T

该类别将是:Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access  NET:XXX | NET:需要忽略字符串的XXXX部分。

完整的投资组合可以是这4个的任意组合,但它们必须按以下顺序V, B, M

我一直在谷歌上搜索并阅读V, B, M, T

尝试用library(stringr)拆分字符串,但它无效。

还有其他想法吗?

此致

DPUT:

sep = "\\|"

1 个答案:

答案 0 :(得分:1)

您可以像这样使用grepl

df <- read.table(text='"Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access"
"Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access"
"Telephone line rental|Fixed broadband|Mobile phone|Paid for TV service|NET: No home phone calls|NET: Internet access"
"Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access"
"Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access"
"Telephone line rental|Fixed broadband|Mobile phone|Paid for TV service|NET: No home phone calls|NET: Internet access"
"Telephone line rental|Fixed broadband|Mobile phone|NET: No home phone calls|NET: Internet access"
"Telephone line rental|Fixed broadband|Mobile phone|Paid for TV service|NET: No home phone calls|NET: Internet access"
"Fixed broadband|Mobile phone|Paid for TV service|NET: No home phone calls|NET: No home phone calls or line rental|NET: Internet access"',
header=FALSE,stringsAsFactors=FALSE)

tv <- c("Paid for TV service","TV","sport code")

df$new_col <- paste(ifelse(grepl("Telephone line rental",df$V1),"V",""),
ifelse(grepl("Fixed broadband",df$V1),"B",""),
ifelse(grepl("Mobile phone",df$V1),"M",""),
ifelse(grepl(paste(tv,collapse = "|"), df$V1),"T","")
)