根据2个不同数据帧(R)之间的匹配列在数据帧中分配值

时间:2017-02-19 20:21:12

标签: r dataframe match

我是R编程的新手,想知道我是否可以在我遇到的问题上得到一些帮助。 目前我有两个数据框架。一个是ign_temp,它有一个视频游戏标题列表(大约18,000个)和它们出现的相应平台(大约30多种类型)。由于在多个平台上发布,有一些标题条目多次出现,如下所示。这个df刚过滤到只显示原始数据库中的标题和平台,原始数据库有很多列(id,url,release year等)。

ign_temp:

            title                           platform
            LittleBigPlanet                 Playstation Vita           
            Splice                          Playstation Vita     
            NHL 13                          Xbox 
            NHL 13                          Android
            Wild                            iPhone
            Mark of the Ninja               Xbox 360
            Mark of the Ninja               PC
            .......

我有另一个数据框ign_revised,其中包含来自上述数据框的游戏样本集,但有其他列值,如得分,年份等。每个游戏每行仅显示一次,我已在数据框中添加新列它们出现的可能平台(从Android开始到Xbox One,大约24个平台之后的年度列)如下所示(浓缩视图):

ign_revised:

       id     score_phrase   title     score   genre    year   Android Arcade ...  Xbox One
       315    Cool           Abzu      7.5     Puzzle   2012   Android Arcade ...  Xbox One
       87     Poor           Alan      5.0     Action   2014   Android Arcade ...  Xbox One
       .....
       598    Great          NHL 13    8.5     Sports   2013   Android Arcade ...  Xbox One

Ign_revised按字母顺序排列,游戏平台列(Android Arcade .. XboxOne)只对该数据框中出现的所有1600多个标题重复使用平台名称。

我的主要问题是有一种像for循环这样的方式,从ign_revised,使用title和平台列(Android Arcade ... XboxOne)匹配ign_temp与相应的标题和平台,并更改ign_revised的值Android Arcade的列... XboxOne代替显示1(视频游戏标题出现在该平台中)或0如果不显示。所以它看起来像下面这样:

ign_revised(最终结果):

       id     score_phrase   title     score   genre    year   Android Arcade ...  Xbox One
       315    Cool           Abzu      7.5     Puzzle   2012   0       1      ...  0
       87     Poor           Alan      5.0     Action   2014   0       0      ...  1
       .....
       598    Great          NHL 13    8.5     Sports   2013   1       0      ...  1

在我的实际ign_revised数据框中,title位于第3列,以Android开头的平台名称是第12列,如果有帮助的话。

伪代码:

   for (i in 1:nrow(ign_revised)) {

      for (j in 12:ncol(ign_revised)) {

           * Match current title and platform to ign_temp
           * Assign current cell (i,j) value with 1 or 0 based on match

          }
    }

谢谢!

@Gregor

编辑1:抱歉,我似乎无法发表评论,并在回复评论中正确地修改了修改后的代码,但由于ign_temp将需要整个18,625个游戏而不仅仅是列出的7个游戏我从原来的df叫做(ign),我应该把它改成这样的东西吗? :

all_title <- ign$title 

all_platform <- ign$platform

ign_temp <- structure(list(title = all_title, platform = all_platform, .Names = c("title","platform"), row.names = c(1, -18625L), class = c("data.frame")))

ign_temp$value = 1
ign_temp_wide = reshape2::dcast(title ~ platform, data = ign_temp,value.var = "value", fill = 0)

merge(ign_revised[1:11], ign_temp_wide)

我不确定因为我收到了错误:

rep(1,nrow(data))出错:无效的'times'参数

编辑2:为ign_revised,ign_temp,ign_temp_wide添加dput。

 dput(droplevels(head(ign_temp, 7)))
 structure(list(title = c("LittleBigPlanet PS Vita", "LittleBigPlanet PS Vita -- Marvel Super Hero Edition", 
 "Splice: Tree of Life", "NHL 13", "NHL 13", "Total War Battles: Shogun", 
 "Double Dragon: Neon"), platform = c("PlayStation Vita", "PlayStation Vita", 
 "iPad", "Xbox 360", "PlayStation 3", "Macintosh", "Xbox 360"), 
value = c(1, 1, 1, 1, 1, 1, 1)), .Names = c("title", "platform", 
  "value"), row.names = c(NA, -7L), class = c("tbl_df", "tbl", 
  "data.frame"))



dput(droplevels(head(ign_temp_wide, 7)))

structure(list(title = c("#IDARB", "007 Legends", "1001 Spikes", 
"140", "1979 Revolution", "2014 FIFA World Cup Brazil", "3 Heroes -- Crystal Soul"
), Android = c(0, 0, 0, 0, 0, 0, 0), Arcade = c(0, 0, 0, 0, 0, 
0, 0), iPad = c(0, 0, 0, 0, 0, 0, 0), iPhone = c(0, 0, 0, 0, 
0, 0, 0), Linux = c(0, 0, 0, 0, 0, 0, 0), Macintosh = c(0, 0, 
0, 0, 0, 0, 0), `New Nintendo 3DS` = c(0, 0, 0, 0, 0, 0, 0), 
    `Nintendo 3DS` = c(0, 0, 1, 0, 0, 0, 0), `Nintendo DS` = c(0, 
    0, 0, 0, 0, 0, 0), `Nintendo DSi` = c(0, 0, 0, 0, 0, 0, 1
    ), Ouya = c(0, 0, 0, 0, 0, 0, 0), PC = c(0, 0, 1, 1, 1, 0, 
    0), `PlayStation 3` = c(0, 1, 0, 0, 0, 1, 0), `PlayStation 4` = c(0, 
    0, 1, 0, 0, 0, 0), `PlayStation Portable` = c(0, 0, 0, 0, 
    0, 0, 0), `PlayStation Vita` = c(0, 0, 1, 0, 0, 0, 0), SteamOS = c(0, 
    0, 0, 0, 0, 0, 0), `Web Games` = c(0, 0, 0, 0, 0, 0, 0), 
    Wii = c(0, 0, 0, 0, 0, 0, 0), `Wii U` = c(0, 1, 1, 0, 0, 
    0, 0), `Windows Phone` = c(0, 0, 0, 0, 0, 0, 0), `Windows Surface` = c(0, 
    0, 0, 0, 0, 0, 0), `Xbox 360` = c(0, 1, 0, 0, 0, 1, 0), `Xbox One` = c(1, 
    0, 0, 0, 0, 0, 0)), .Names = c("title", "Android", "Arcade", 
"iPad", "iPhone", "Linux", "Macintosh", "New Nintendo 3DS", "Nintendo 3DS", 
"Nintendo DS", "Nintendo DSi", "Ouya", "PC", "PlayStation 3", 
"PlayStation 4", "PlayStation Portable", "PlayStation Vita", 
"SteamOS", "Web Games", "Wii", "Wii U", "Windows Phone", "Windows Surface", 
"Xbox 360", "Xbox One"), row.names = c(NA, 7L), class = "data.frame")

dput(droplevels(head(ign_revised, 7)))
structure(list(X1 = c(18007L, 145L, 17730L, 17325L, 18475L, 17699L, 
16486L), score_phrase = c("Good", "Bad", "Great", "Great", "Great", 
"Good", "Mediocre"), title = c("#IDARB", "007 Legends", "1001 Spikes", 
"140", "1979 Revolution", "2014 FIFA World Cup Brazil", "3 Heroes -- Crystal Soul"
), url = c("/games/it-draws-a-red-box/xbox-one-20014945", "/games/007-legends/xbox-360-132394", 
"/games/1001-spikes/wii-u-132248", "/games/140-game/pc-20007190", 
"/games/1979-the-game/pc-115360", "/games/2014-fifa-world-cup/ps3-20012688", 
"/games/3-heroes-crystal-soul/dsi-126064"), platform = c("Xbox One", 
"Xbox 360", "Wii U", "PC", "PC", "PlayStation 3", "Nintendo DSi"
), score = c(7.5, 4.5, 8, 8, 8, 7.5, 5), genre = c("Party", "Action", 
"Platformer", "Platformer", "Action, Adventure", "Sports", "Adventure"
), editors_choice = c("N", "N", "N", "N", "N", "N", "N"), release_year = c(2015L, 
2012L, 2014L, 2013L, 2016L, 2014L, 2012L), release_month = c(1L, 
10L, 6L, 10L, 4L, 4L, 1L), release_day = c(14L, 16L, 8L, 16L, 
21L, 17L, 5L), Android = c("Android", "Android", "Android", "Android", 
"Android", "Android", "Android"), Arcade = c("Arcade", "Arcade", 
"Arcade", "Arcade", "Arcade", "Arcade", "Arcade"), iPad = c("iPad", 
"iPad", "iPad", "iPad", "iPad", "iPad", "iPad"), iPhone = c("iPhone", 
"iPhone", "iPhone", "iPhone", "iPhone", "iPhone", "iPhone"), 
    Linux = c("Linux", "Linux", "Linux", "Linux", "Linux", "Linux", 
    "Linux"), Macintosh = c("Macintosh", "Macintosh", "Macintosh", 
    "Macintosh", "Macintosh", "Macintosh", "Macintosh"), `New Nintendo 3DS` = c("New Nintendo 3DS", 
    "New Nintendo 3DS", "New Nintendo 3DS", "New Nintendo 3DS", 
    "New Nintendo 3DS", "New Nintendo 3DS", "New Nintendo 3DS"
    ), `Nintendo 3DS` = c("Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS", 
    "Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS"
    ), `Nintendo DS` = c("Nintendo DS", "Nintendo DS", "Nintendo DS", 
    "Nintendo DS", "Nintendo DS", "Nintendo DS", "Nintendo DS"
    ), `Nintendo DSi` = c("Nintendo DSi", "Nintendo DSi", "Nintendo DSi", 
    "Nintendo DSi", "Nintendo DSi", "Nintendo DSi", "Nintendo DSi"
    ), Ouya = c("Ouya", "Ouya", "Ouya", "Ouya", "Ouya", "Ouya", 
    "Ouya"), PC = c("PC", "PC", "PC", "PC", "PC", "PC", "PC"), 
    `PlayStation 3` = c("PlayStation 3", "PlayStation 3", "PlayStation 3", 
    "PlayStation 3", "PlayStation 3", "PlayStation 3", "PlayStation 3"
    ), `PlayStation 4` = c("PlayStation 4", "PlayStation 4", 
    "PlayStation 4", "PlayStation 4", "PlayStation 4", "PlayStation 4", 
    "PlayStation 4"), `PlayStation Portable` = c("PlayStation Portable", 
    "PlayStation Portable", "PlayStation Portable", "PlayStation Portable", 
    "PlayStation Portable", "PlayStation Portable", "PlayStation Portable"
    ), `PlayStation Vita` = c("PlayStation Vita", "PlayStation Vita", 
    "PlayStation Vita", "PlayStation Vita", "PlayStation Vita", 
    "PlayStation Vita", "PlayStation Vita"), SteamOS = c("SteamOS", 
    "SteamOS", "SteamOS", "SteamOS", "SteamOS", "SteamOS", "SteamOS"
    ), `Web Games` = c("Web Games", "Web Games", "Web Games", 
    "Web Games", "Web Games", "Web Games", "Web Games"), Wii = c("Wii", 
    "Wii", "Wii", "Wii", "Wii", "Wii", "Wii"), `Wii U` = c("Wii U", 
    "Wii U", "Wii U", "Wii U", "Wii U", "Wii U", "Wii U"), `Windows Phone` = c("Windows Phone", 
    "Windows Phone", "Windows Phone", "Windows Phone", "Windows Phone", 
    "Windows Phone", "Windows Phone"), `Windows Surface` = c("Windows Surface", 
    "Windows Surface", "Windows Surface", "Windows Surface", 
    "Windows Surface", "Windows Surface", "Windows Surface"), 
    `Xbox 360` = c("Xbox 360", "Xbox 360", "Xbox 360", "Xbox 360", 
    "Xbox 360", "Xbox 360", "Xbox 360"), `Xbox One` = c("Xbox One", 
    "Xbox One", "Xbox One", "Xbox One", "Xbox One", "Xbox One", 
    "Xbox One")), .Names = c("X1", "score_phrase", "title", "url", 
"platform", "score", "genre", "editors_choice", "release_year", 
"release_month", "release_day", "Android", "Arcade", "iPad", 
"iPhone", "Linux", "Macintosh", "New Nintendo 3DS", "Nintendo 3DS", 
"Nintendo DS", "Nintendo DSi", "Ouya", "PC", "PlayStation 3", 
"PlayStation 4", "PlayStation Portable", "PlayStation Vita", 
"SteamOS", "Web Games", "Wii", "Wii U", "Windows Phone", "Windows Surface", 
"Xbox 360", "Xbox One"), row.names = c(NA, -7L), class = c("tbl_df", 
"tbl", "data.frame"))

我还检查了两个df中两个标题列的typeof,因为它们都是“character”

typeof(ign_temp$title)
[1] "character"
> typeof(ign_revised$title)
[1] "character"

@Gregor 但是,合并似乎仍然没有奏效。由于它是内连接,我也尝试用“标题”指定,但平台列在ign_revised中仍保持不变。有什么建议吗?

merge(ign_revised[1:11], ign_temp_wide, by = "title")

1 个答案:

答案 0 :(得分:0)

我首先将您的ign_temp数据框转换为宽格式,根据需要创建虚拟变量,然后加入ign_revised数据。

使用此输入:

ign_temp = structure(list(title = c("LittleBigPlanet", "Splice", "NHL 13", 
"NHL 13", "Wild", "Mark of the Ninja", "Mark of the Ninja"), 
    platform = c("Playstation Vita", "Playstation Vita", "Xbox", 
    "Android", "iPhone", "Xbox 360", "PC")), .Names = c("title", 
"platform"), row.names = c(NA, -7L), class = c("data.frame"))

ign_temp$value = 1
ign_temp_wide = reshape2::dcast(title ~ platform, data = ign_temp,
                           value.var = "value", fill = 0)
ign_temp_wide
#               title Android iPhone PC Playstation Vita Xbox Xbox 360
# 1   LittleBigPlanet       0      0  0                1    0        0
# 2 Mark of the Ninja       0      0  1                0    0        1
# 3            NHL 13       1      0  0                0    1        0
# 4            Splice       0      0  0                1    0        0
# 5              Wild       0      1  0                0    0        0

然后加入很简单。这应该有效:

merge(ign_revised[1:11], ign_temp_wide)

您只需要在ign_revised非平台列之间进行内部联接(我使用1:11,因为您说平台从第12列开始)和整个ign_temp_widebase::merge有效,但您可以从How to join in R中选择自己喜欢的方法。如果您遇到连接问题,请确保两个数据框中的title都是character类列。我还假设两个数据框中的列名"title"相同。