我有此数据:
USDfirms <- c("GOOG", "BABA" "0071.TW")
TWRfirms <- c("3231.TW")
JPYfirms <- c("7752.T")
我正在尝试使用grepl
函数来创建新列。因此,如果ticker
数据中的df
与上述3个字符串向量之一中的公司3231.TW
匹配,则分配一个值(TWRmatch
)或ticker
与公司GOOG
分配值USDmatch
等。
ticker
的值可能并不总是很合适,即ticker
3231与3231.TW
并不完全匹配,这就是为什么我想使用grepl
来匹配时忽略.TW。
df <- structure(list(symbol = c("3231.TW", "3231.TW", "3231.TW", "3231.TW",
"7752.T", "7752.T", "7752.T", "7752.T", "GOOG", "GOOG", "GOOG",
"GOOG", "BABA", "BABA", "BABA", "BABA"), ticker = c("3231", "3231",
"3231", "3231", "7752", "7752", "7752", "7752", "GOOG", "GOOG",
"GOOG", "GOOG", "BABA", "BABA", "BABA", "BABA"), country = c("TW",
"TW", "TW", "TW", "T", "T", "T", "T", NA, NA, NA, NA, NA, NA,
NA, NA), year = c(2017L, 2016L, 2015L, 2014L, 2018L, 2017L, 2016L,
2015L, 2017L, 2016L, 2015L, 2014L, 2018L, 2017L, 2016L, 2015L
)), .Names = c("symbol", "ticker", "country", "year"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 123L, 124L, 125L, 126L, 127L, 128L,
129L, 130L), class = "data.frame")
编辑:
此功能似乎无效
ifelse(grepl(USDfirms, df$ticker), "yes", "no")
我也很害怕:
df$match <- ifelse(USDfirms %in% x$ticker, "yes", "no")
哪件事我都同意。
答案 0 :(得分:1)
不是完美的解决方案,但蛮力方法可能是使用嵌套的lapply
/ sapply
解决方案。在这里,每个ticker
上都有一个双循环,遍历firm_list
的每个元素,我们检查它是否存在于列表中的任何元素中,如果存在,则提取该列表的名称。
df$firms <- unlist(lapply(df$ticker, function(x)
unlist(sapply(seq_along(firm_list), function(y) {
if (any(grepl(x, unlist(firm_list[y]))))
names(firm_list[y])
}))))
df
# symbol ticker country year firms
#1 3231.TW 3231 TW 2017 TWRfirms
#2 3231.TW 3231 TW 2016 TWRfirms
#3 3231.TW 3231 TW 2015 TWRfirms
#4 3231.TW 3231 TW 2014 TWRfirms
#5 7752.T 7752 T 2018 JPYfirms
#6 7752.T 7752 T 2017 JPYfirms
#7 7752.T 7752 T 2016 JPYfirms
#8 7752.T 7752 T 2015 JPYfirms
#123 GOOG GOOG <NA> 2017 USDfirms
#124 GOOG GOOG <NA> 2016 USDfirms
#125 GOOG GOOG <NA> 2015 USDfirms
#126 GOOG GOOG <NA> 2014 USDfirms
#127 BABA BABA <NA> 2018 USDfirms
#128 BABA BABA <NA> 2017 USDfirms
#129 BABA BABA <NA> 2016 USDfirms
#130 BABA BABA <NA> 2015 USDfirms
我们将所有公司移动到列表中,以便于检查。
firm_list <- list(USDfirms = c("GOOG", "BABA", "0071.TW"),
TWRfirms = c("3231.TW"),
JPYfirms = c("7752.T"))
或者实际上,如果我们创建查找数据帧然后进行匹配并从中提取内容,它将更加方便快捷。
ref_df <- data.frame(firms = unlist(firm_list),
names = rep(names(firm_list), lengths(firm_list)))
df$firms <- ref_df$names[sapply(df$ticker, function(x) grep(x, ref_df$firms))]
df
# symbol ticker country year firms
#1 3231.TW 3231 TW 2017 TWRfirms
#2 3231.TW 3231 TW 2016 TWRfirms
#3 3231.TW 3231 TW 2015 TWRfirms
#4 3231.TW 3231 TW 2014 TWRfirms
#5 7752.T 7752 T 2018 JPYfirms
#6 7752.T 7752 T 2017 JPYfirms
#7 7752.T 7752 T 2016 JPYfirms
#8 7752.T 7752 T 2015 JPYfirms
#123 GOOG GOOG <NA> 2017 USDfirms
#124 GOOG GOOG <NA> 2016 USDfirms
#125 GOOG GOOG <NA> 2015 USDfirms
#126 GOOG GOOG <NA> 2014 USDfirms
#127 BABA BABA <NA> 2018 USDfirms
#128 BABA BABA <NA> 2017 USDfirms
#129 BABA BABA <NA> 2016 USDfirms
#130 BABA BABA <NA> 2015 USDfirms