我有一个包含许多错误名称的数据集。我创建了一个两列.csv,其中包含一列中的旧(不正确)名称以及第二列中相应的新(正确)名称。现在我需要告诉R用正确的名称替换数据中的每个旧名称。
testData = data.table(oldName = c("Nu York", "Was DC", "Buston", "Nu York"))
replacements = data.table(oldName = c("Buston", "Nu York", "Was DC"),
newName = c("Boston", "New York", "Washington DC"))
# The next line fails.
holder = replace(testData, testData[, oldName]==replacements[, oldName],
replacements[, newName]
答案 0 :(得分:7)
这就是我替换的方式:
setkey(testData, oldName)
setkey(replacements, oldName)
testData[replacements, oldName := newName]
testData
# oldName
#1: Boston
#2: New York
#3: New York
#4: Washington DC
如果您喜欢原始订单,可以添加索引,并在最后按原始顺序将其放回。
答案 1 :(得分:1)
我到达这里寻找解决方案,并设法将其调整为我的要求。如果需要保持原始订单,请不要使用setkey
。我在两个表上都添加了互斥行,以进行更好的测试。
library(data.table)
testData = data.table(
city = c("Nu York", "Was DC", "Buston", "Nu York", "Alabama")
)
如果查询表中按列名称进行的联接相同:
replacements = data.table(
city = c("Buston", "Nu York", "Was DC", "tstDummy"),
city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)
testData[replacements, city := city_newName, on=.(city)][]
如果查询表中按列名称进行的联接不同:
replacements = data.table(
city_oldName = c("Buston", "Nu York", "Was DC", "tstDummy"),
city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)
testData[replacements, city := city_newName, on=.(city = city_oldName)][]
无论哪种方式,testData
都将更改为:
city
1: New York
2: Washington DC
3: Boston
4: New York
5: Alabama
不进行任何键操作并保留原始顺序。