Question

我有一个包含45行的样本数据集，如下所示。

 itemid                    title release_date
16    573          Body Snatchers          1993
17    670          Body Snatchers          1993
41   1645        Butcher Boy, The          1998
42   1650        Butcher Boy, The          1998
1     218               Cape Fear          1991
18    673               Cape Fear          1962
27   1234   Chairman of the Board          1998
43   1654   Chairman of the Board          1998
2     246             Chasing Amy          1997
5     268             Chasing Amy          1997
11    309                Deceiver          1997
37   1606                Deceiver          1997
28   1256 Designated Mourner, The          1997
29   1257 Designated Mourner, The          1997
12    329      Desperate Measures          1998
13    348      Desperate Measures          1998
9     304           Fly Away Home          1996
15    500           Fly Away Home          1996
26   1175               Hugo Pool          1997
39   1617               Hugo Pool          1997
31   1395       Hurricane Streets          1998
38   1607       Hurricane Streets          1998
10    305          Ice Storm, The          1997
21    865          Ice Storm, The          1997
4     266      Kull the Conqueror          1997
19    680      Kull the Conqueror          1997
22    876             Money Talks          1997
24    881             Money Talks          1997
35   1477              Nightwatch          1997
40   1625              Nightwatch          1997
6     274                 Sabrina          1995
14    486                 Sabrina          1954
33   1442     Scarlet Letter, The          1995
36   1542     Scarlet Letter, The          1926
3     251         Shall We Dance?          1996
30   1286         Shall We Dance?          1937
32   1429           Sliding Doors          1998
45   1680           Sliding Doors          1998
20    711  Substance of Fire, The          1996
44   1658  Substance of Fire, The          1996
23    878          That Darn Cat!          1997
25   1003          That Darn Cat!          1997
34   1444          That Darn Cat!          1965
7     297             Ulee's Gold          1997
8     303             Ulee's Gold          1997

我要做的是根据电影名称转换itemid，以及电影的发布日期是否相同。例如，电影'Ulee's Gold'有两个项目ID 297＆amp; 303.我试图找到一种方法来自动检查电影的发布日期，如果相同，该电影的itemid [2]应该用itemid [1]替换。暂时我通过将itemid提取到两个向量x＆amp;中来手动完成它。然后使用矢量化更改它们。我想知道是否有更好的方法来完成这项任务，因为只有18部电影有多个id，但数据集有几百个。手动查找和处理这将非常耗时。

我提供了用于完成此任务的代码。

x <- c(670,1650,1654,268,1606,1257,348,500,1617,1607,865,680,881,1625,1680,1658,1003,303)
y<- c(573,1645,1234,246,309,1256,329,304,1175,1395,305,266,876,1477,1429,711,878,297)


for(i in 1:18)
{
  df$itemid[x[i]] <- y[i]

}

有没有更好的方法来完成这项工作？

Answer 1

我认为你可以直接在dplyr中完成：

使用上面的评论，一个简短的例子：

itemid <- c(878,1003,1444,297,303)
title <- c(rep("That Darn Cat!", 3), rep("Ulee's Gold", 2))
year <- c(1997,1997,1965,1997,1997)

temp <- data.frame(itemid,title,year)
temp

library(dplyr)

temp %>% group_by(title,year) %>% mutate(itemid1 = min(itemid))

（由于某种原因，我将'release_date'改为'year'但这基本上将标题/年组合在一起，搜索最小的itemid，mutate创建一个具有最低'itemid的新变量”。

给出：

#  itemid          title year itemid1
#1    878 That Darn Cat! 1997     878
#2   1003 That Darn Cat! 1997     878
#3   1444 That Darn Cat! 1965    1444
#4    297    Ulee's Gold 1997     297
#5    303    Ulee's Gold 1997     297

在r中自动查找和转换值

1 个答案: