目标是为公司创建和使用匿名名称。这样做可以分发样本样本而不会泄露有关特定公司的专有信息。
玩具数据框显示可能存在多个公司实例,并且不同公司的名称以不可预测的方式变化。该代码可行,但似乎很费力且容易出错。
是否有更有效的方法在具有匿名替换名称的新变量中重命名每个公司?
df <- data.frame(firm = c(rep("Alpha LLC",3), "Baker & Charlie", rep("Delta and Associates", 2), "Epsilon", "The Gamma Firm"), fees = rep(100, 500, 8))
# create a translation table (named vector) where each firm has a unique "name" of the form "Firm LETTER number"
uniq <- as.character(unique(df$firm))
uniq.df <- data.frame(firmname = uniq, anonfirm = paste0("Firm ", LETTERS[seq(1:length(uniq))], seq(1:length(uniq))))
# create a "named vector" with firm on top (as names) and anonymous name on bottom
translation.vec <- uniq.df[ , 2] # the anonymous name firm name
names(translation.vec) <- uniq.df[ , 1] # original name as column name for anonymous firm name
df$anon <- translation.vec[df$firm] # finds index of firm; replaces w/anonymous
> df
firm fees anon
1 Alpha LLC 100 Firm A1
2 Alpha LLC 100 Firm A1
3 Alpha LLC 100 Firm A1
4 Baker & Charlie 100 Firm B2
5 Delta and Associates 100 Firm C3
6 Delta and Associates 100 Firm C3
7 Epsilon 100 Firm D4
8 The Gamma Firm 100 Firm E5
答案 0 :(得分:5)
当您将公司名称存储在data.frame中时,它们就成了一个因素。交换因子级别的名称非常简单。例如
set.seed(15) # so sample() is reproducible
newnames <- paste0("Firm ", LETTERS[1:nlevels(df$firm)], 1:nlevels(df$firm))
df$anon <- factor(df$firm, labels=sample(newnames))
这里我只是更改因子的标签。我还投入一个sample()
其他明智的公司将按字母顺序命名。这会产生
firm fees anon
1 Alpha LLC 100 Firm D4
2 Alpha LLC 100 Firm D4
3 Alpha LLC 100 Firm D4
4 Baker & Charlie 100 Firm A1
5 Delta and Associates 100 Firm C3
6 Delta and Associates 100 Firm C3
7 Epsilon 100 Firm B2
8 The Gamma Firm 100 Firm E5
您的新因素水平的顺序仍将包含有关公司原始顺序的一些信息;如果您打算共享R数据集而不是保存到平面文本文件或只显示信息,则可以通过转换为字符来消除该数据。
df$anon <- as.character(factor(df$firm, labels=sample(newnames)))
答案 1 :(得分:1)
扩展@ LaurenGoodwin的非常明智的评论 -
您可以更改为一个因子,然后更改为数字,这将使每个公司成为不同的数字
companies <- LETTERS
anon <- as.numeric(as.factor(companies))
如果您希望它不仅仅是一个数字,只需更改为一个字符并使用粘贴。
anon <- paste('Firm', as.character(anon))
[1] "Firm 1" "Firm 2" "Firm 3" "Firm 4" "Firm 5" "Firm 6" "Firm 7"