Question

我有一堆城市的文件（到目前为止，183），并且没有一个县映射到他们，这是我需要的东西。为了重新编码分类变量，我通常使用plyr的rename（）函数，但我不想编写一个杂乱的代码片段来重新编码所有这些城市。我最近也学习了一些python，这个问题听起来有点像字典/哈希表问题。如果可能的话，我想学习更多程序化的东西。

作为第一枪，我继续创建了一个.csv，其中每个城市的名称在一列中，而其县在另一列中。我希望以某种方式将这个与我需要的文件结合在一起，以便可以映射事物。一些最小的代码来表明我的意思：

#key_file: 
LocalityName <- c('Addy', 'Burien', 'Newman Lake', 'Seattle', 'Tacoma')
CountyName <- c('Stevens', 'King', 'Spokane', 'King', 'Pierce')
key <- cbind.data.frame(LocalityName, CountyName)

#real_file:
LocalityName <- c('Seattle', 'Seattle', 'Tacoma', 'Seattle', 'Newman Lake')
CountyName <- rep(NA, length(LocalityName))
Extra_Example_Col <- c('Y', 'Y', 'N', 'N', 'N')
real <- cbind.data.frame(LocalityName, CountyName, Extra_Example_Col)

我尝试在plyr中使用join（）但是无法让它工作（我可以使用我的代码更新，如果这是一个让我遵循的正确轨道，不确定）。我也知道sqldf包，但由于我现在也是第一次学习SQL，我不确定这是否是一种类型的连接？我的大脑认为这是一个对多对多的＆＃34;一种映射。

我认为现在尝试学习所有这些其他语言让我感到困惑，但它给了我一些如何尝试的想法。我的首选解决方案是R惯用。

Answer 1

要进行映射，您可以使用merge。例如：

merge(real, key, by='LocalityName', all.x=TRUE)

Answer 2

如果我正确理解您的问题，您可以使用基础R中的merge或来自plyr的join。例如：

# Key_file: 
LocalityName <- c('Addy', 'Burien', 'Newman Lake', 'Seattle', 'Tacoma')
CountyName <- c('Stevens', 'King', 'Spokane', 'King', 'Pierce')
key <- cbind.data.frame(LocalityName, CountyName)

# Real_file:
LocalityName <- c('Seattle', 'Seattle', 'Tacoma', 'Seattle', 'Newman Lake')
CountyName <- rep(NA, length(LocalityName))
Extra_Example_Col <- c('Y', 'Y', 'N', 'N', 'N')
real <- cbind.data.frame(LocalityName, CountyName, Extra_Example_Col)

# merge
merge(real, key, by = "LocalityName")
##   LocalityName CountyName.x Extra_Example_Col CountyName.y
## 1  Newman Lake           NA                 N      Spokane
## 2      Seattle           NA                 Y         King
## 3      Seattle           NA                 Y         King
## 4      Seattle           NA                 N         King
## 5       Tacoma           NA                 N       Pierce

# plyr::join
join(real, key, by = "LocalityName")
##   LocalityName CountyName Extra_Example_Col CountyName
## 1      Seattle         NA                 Y       King
## 2      Seattle         NA                 Y       King
## 3       Tacoma         NA                 N     Pierce
## 4      Seattle         NA                 N       King
## 5  Newman Lake         NA                 N    Spokane

请注意，对于merge，您会获得CountyName.x和CountyName.y，因为两个数据集中都存在相同的列。使用join，您有两列名为CountyName的列。您可能不希望初始化CountyName data.frame中的real列。例如，在合并之前让real <- cbind.data.frame(LocalityName, Extra_Example_Col)或real[["CountyName"]] <- NULL删除该列。

Answer 3

library(data.table)

key  <- as.data.table(key)
real <- as.data.table(real)

## If necessary, make sure your values are strings, not factors, etc
key[, LocalityName := as.character(LocalityName)]
real[, LocalityName := as.character(LocalityName)]

## Set the keys, this is for joining.
##  not to be confused with your object named "key"
setkey(key, LocalityName)
setkey(real, LocalityName)

## Ensure you have a character and not a logical 
key[, CountyName := as.character(CountyName)]
real[, CountyName := as.character(CountyName)]

## The i.X notation indicates to take the value 
##   from the column inside the [brackets]
real[key, CountyName := i.CountyName]

real
#    LocalityName CountyName Extra_Example_Col
# 1:  Newman Lake    Spokane                 N
# 2:      Seattle       King                 Y
# 3:      Seattle       King                 Y
# 4:      Seattle       King                 N
# 5:       Tacoma     Pierce                 N

在R中重新编码150多个分类变量

3 个答案: