我想从数据框中找到唯一公司名称的编号:
/organization/-fame
/ORGANIZATION/-QOUNTER
/organization/-qounter
/ORGANIZATION/-THE-ONE-OF-THEM-INC-
/ORGANIZATION/0NDINE-BIOMEDICAL-INC
/organization/0ndine-biomedical-inc
我已使用split
函数
split_prod <- str_split_fixed(rounds2$company_permalink,"/", 4)
并转换为新的数据框:
companyname <- data.frame(split_prod, stringsAsFactors = FALSE)
我得到了如下所述的四列输出:
X1 X2 X3 X4
organization -fame
ORGANIZATION -QOUNTER
organization -qounter
ORGANIZATION -THE-ONE-OF-THEM-INC-
organization 0-6-com
ORGANIZATION 004-TECHNOLOGIES
organization 01games-technology
ORGANIZATION 0NDINE-BIOMEDICAL-INC
organization 0ndine-biomedical-inc
如何计算现在唯一公司名称的数量?我试过了:
`distinct(rounds$X3)` ----- not working
`length(unique(rounds$X3)` --- wrong output number i m getting.
请帮忙。另外,我不确定我使用拆分功能的方式是否正确。关于数字“4”的特殊性。我已将此数字计算为斜线,组织,公司名称,斜线,因此尝试将其分为四列。
答案 0 :(得分:0)
代码:
length(unique(tolower(companyname$X3)))
将在X3
数据框的companyname
列中返回唯一公司的数量。
答案 1 :(得分:0)
如果您使用的是tolower
套餐,请使用toupper
或str_to_lower
或str_to_upper
/ stringr
。否则-QOUNTER和-qounter将被计算两次。
完整示例:
library(stringr)
text <- c("/organization/-fame",
"/ORGANIZATION/-QOUNTER",
"/organization/-qounter",
"/ORGANIZATION/-THE-ONE-OF-THEM-INC-",
"/ORGANIZATION/0NDINE-BIOMEDICAL-INC",
"/organization/0ndine-biomedical-inc")
split_prod <- str_split_fixed(text,"/", 4)
companyname <- data.frame(split_prod, stringsAsFactors = FALSE)
str(companyname)
head(companyname)
length(unique(tolower(companyname$X3)))
[1] 4
创建了列X4,因为您在str_split_fixed中指定了4。