我有一个大型数据框" df"有2列:
**column1** **column2**
The City of New York TCNY
The Land of the Free TLF
Stellar Stars Basketball Program SSBP
Center for Life Sciences CLS
Children's Hospital of Los Angeles CHLA
New York Yankees NY
etc etc
我做了一些研究,看到你可以使用mapply同时在两个栏目上执行一项功能,但我不确定我会做什么功能。我正在考虑做一些功能,其中函数检查column1的字符串中的所有大写字母,并检查列2中是否存在这些大写字母,但实际上不确定如何...任何帮助都会很棒!非常感谢你!
答案 0 :(得分:0)
以下是我认为您可能尝试实现的示例(在您在问题中显示的行的子集上):
df <- data.frame(
col_1 = c("The City of New York", "The Land of the Free", "New York Yankees"),
col_2 = c("TCNY", "TLF", "NY")
)
> df
col_1 col_2
1 The City of New York TCNY
2 The Land of the Free TLF
3 New York Yankees NY
# Add a third column indicating whether the capitalised letters of the first
# column are equal to the strings in the second
df$col_3 <- unlist(apply(df, 1, function(x) gsub("[^A-Z]", "", x[1]) == x[2]))
> df
col_1 col_2 col_3
1 The City of New York TCNY TRUE
2 The Land of the Free TLF TRUE
3 New York Yankees NY FALSE
上面我使用gsub
从第一列值中删除任何非大写字母的字符,然后将它们与apply
语句中的第二列进行比较,该语句对每个字符进行操作数据帧的一行。然后我使用unlist
将结果从列表转换为向量,该向量可以存储在数据框df
的第三列中。
答案 1 :(得分:0)
使用base r
transform(dat,correctABBV=x<-gsub("[^A-Z]","",column1),check=x==column2)
column1 column2 correctABBV check
1 The City of New York TCNY TCNY TRUE
2 The Land of the Free TLF TLF TRUE
3 Stellar Stars Basketball Program SSBP SSBP TRUE
4 Center for Life Sciences CLS CLS TRUE
5 Children's Hospital of Los Angeles CHLA CHLA TRUE
6 New York Yankees NY NYY FALSE
答案 2 :(得分:0)
这是一种方法。我不确定你是否想要etc
作为缩写。目前,我将其视为缩写。首先,我想根据第一列创建缩写。我使用stri_count()
检查了每个字符串中存在多少个单词。当答案对逻辑条件为TRUE时,我使用gsub()
提取大写字母。当答案对于逻辑条件为假时,我将mycol1
中的元素添加到abb
。最后,我检查了abb
和mycol2
中的元素是否相同,并创建了check
。
mydf <- data.frame(mycol1 = c("The City of New York", "The Land of the Free", "Stellar Stars Basketball Program",
"Center for Life Sciences", "Children's Hospital of Los Angeles", "New York Yankees", "etc"),
mycol2 = c("TCNY", "TLF", "SSBP", "CLS", "CHLA", "NY", "etc"),
stringsAsFactors = FALSE)
library(dplyr)
library(stringi)
mutate(mydf,
abb = if_else(stri_count(mycol1, regex = "\\w+") > 1,
gsub(x = mycol1, pattern = "[^A-Z]",replacement = ""),
mycol1),
check = abb == mycol2)
mycol1 mycol2 abb check
1 The City of New York TCNY TCNY TRUE
2 The Land of the Free TLF TLF TRUE
3 Stellar Stars Basketball Program SSBP SSBP TRUE
4 Center for Life Sciences CLS CLS TRUE
5 Children's Hospital of Los Angeles CHLA CHLA TRUE
6 New York Yankees NY NYY FALSE
7 etc etc etc TRUE