我是R编程的新手,想知道我是否可以在我遇到的问题上得到一些帮助。 目前我有两个数据框架。一个是ign_temp,它有一个视频游戏标题列表(大约18,000个)和它们出现的相应平台(大约30多种类型)。由于在多个平台上发布,有一些标题条目多次出现,如下所示。这个df刚过滤到只显示原始数据库中的标题和平台,原始数据库有很多列(id,url,release year等)。
ign_temp:
title platform
LittleBigPlanet Playstation Vita
Splice Playstation Vita
NHL 13 Xbox
NHL 13 Android
Wild iPhone
Mark of the Ninja Xbox 360
Mark of the Ninja PC
.......
我有另一个数据框ign_revised,其中包含来自上述数据框的游戏样本集,但有其他列值,如得分,年份等。每个游戏每行仅显示一次,我已在数据框中添加新列它们出现的可能平台(从Android开始到Xbox One,大约24个平台之后的年度列)如下所示(浓缩视图):
ign_revised:
id score_phrase title score genre year Android Arcade ... Xbox One
315 Cool Abzu 7.5 Puzzle 2012 Android Arcade ... Xbox One
87 Poor Alan 5.0 Action 2014 Android Arcade ... Xbox One
.....
598 Great NHL 13 8.5 Sports 2013 Android Arcade ... Xbox One
Ign_revised按字母顺序排列,游戏平台列(Android Arcade .. XboxOne)只对该数据框中出现的所有1600多个标题重复使用平台名称。
我的主要问题是有一种像for循环这样的方式,从ign_revised,使用title和平台列(Android Arcade ... XboxOne)匹配ign_temp与相应的标题和平台,并更改ign_revised的值Android Arcade的列... XboxOne代替显示1(视频游戏标题出现在该平台中)或0如果不显示。所以它看起来像下面这样:
ign_revised(最终结果):
id score_phrase title score genre year Android Arcade ... Xbox One
315 Cool Abzu 7.5 Puzzle 2012 0 1 ... 0
87 Poor Alan 5.0 Action 2014 0 0 ... 1
.....
598 Great NHL 13 8.5 Sports 2013 1 0 ... 1
在我的实际ign_revised数据框中,title位于第3列,以Android开头的平台名称是第12列,如果有帮助的话。
伪代码:
for (i in 1:nrow(ign_revised)) {
for (j in 12:ncol(ign_revised)) {
* Match current title and platform to ign_temp
* Assign current cell (i,j) value with 1 or 0 based on match
}
}
谢谢!
@Gregor
编辑1:抱歉,我似乎无法发表评论,并在回复评论中正确地修改了修改后的代码,但由于ign_temp将需要整个18,625个游戏而不仅仅是列出的7个游戏我从原来的df叫做(ign),我应该把它改成这样的东西吗? :
all_title <- ign$title
all_platform <- ign$platform
ign_temp <- structure(list(title = all_title, platform = all_platform, .Names = c("title","platform"), row.names = c(1, -18625L), class = c("data.frame")))
ign_temp$value = 1
ign_temp_wide = reshape2::dcast(title ~ platform, data = ign_temp,value.var = "value", fill = 0)
merge(ign_revised[1:11], ign_temp_wide)
我不确定因为我收到了错误:
rep(1,nrow(data))出错:无效的'times'参数
编辑2:为ign_revised,ign_temp,ign_temp_wide添加dput。
dput(droplevels(head(ign_temp, 7)))
structure(list(title = c("LittleBigPlanet PS Vita", "LittleBigPlanet PS Vita -- Marvel Super Hero Edition",
"Splice: Tree of Life", "NHL 13", "NHL 13", "Total War Battles: Shogun",
"Double Dragon: Neon"), platform = c("PlayStation Vita", "PlayStation Vita",
"iPad", "Xbox 360", "PlayStation 3", "Macintosh", "Xbox 360"),
value = c(1, 1, 1, 1, 1, 1, 1)), .Names = c("title", "platform",
"value"), row.names = c(NA, -7L), class = c("tbl_df", "tbl",
"data.frame"))
dput(droplevels(head(ign_temp_wide, 7)))
structure(list(title = c("#IDARB", "007 Legends", "1001 Spikes",
"140", "1979 Revolution", "2014 FIFA World Cup Brazil", "3 Heroes -- Crystal Soul"
), Android = c(0, 0, 0, 0, 0, 0, 0), Arcade = c(0, 0, 0, 0, 0,
0, 0), iPad = c(0, 0, 0, 0, 0, 0, 0), iPhone = c(0, 0, 0, 0,
0, 0, 0), Linux = c(0, 0, 0, 0, 0, 0, 0), Macintosh = c(0, 0,
0, 0, 0, 0, 0), `New Nintendo 3DS` = c(0, 0, 0, 0, 0, 0, 0),
`Nintendo 3DS` = c(0, 0, 1, 0, 0, 0, 0), `Nintendo DS` = c(0,
0, 0, 0, 0, 0, 0), `Nintendo DSi` = c(0, 0, 0, 0, 0, 0, 1
), Ouya = c(0, 0, 0, 0, 0, 0, 0), PC = c(0, 0, 1, 1, 1, 0,
0), `PlayStation 3` = c(0, 1, 0, 0, 0, 1, 0), `PlayStation 4` = c(0,
0, 1, 0, 0, 0, 0), `PlayStation Portable` = c(0, 0, 0, 0,
0, 0, 0), `PlayStation Vita` = c(0, 0, 1, 0, 0, 0, 0), SteamOS = c(0,
0, 0, 0, 0, 0, 0), `Web Games` = c(0, 0, 0, 0, 0, 0, 0),
Wii = c(0, 0, 0, 0, 0, 0, 0), `Wii U` = c(0, 1, 1, 0, 0,
0, 0), `Windows Phone` = c(0, 0, 0, 0, 0, 0, 0), `Windows Surface` = c(0,
0, 0, 0, 0, 0, 0), `Xbox 360` = c(0, 1, 0, 0, 0, 1, 0), `Xbox One` = c(1,
0, 0, 0, 0, 0, 0)), .Names = c("title", "Android", "Arcade",
"iPad", "iPhone", "Linux", "Macintosh", "New Nintendo 3DS", "Nintendo 3DS",
"Nintendo DS", "Nintendo DSi", "Ouya", "PC", "PlayStation 3",
"PlayStation 4", "PlayStation Portable", "PlayStation Vita",
"SteamOS", "Web Games", "Wii", "Wii U", "Windows Phone", "Windows Surface",
"Xbox 360", "Xbox One"), row.names = c(NA, 7L), class = "data.frame")
dput(droplevels(head(ign_revised, 7)))
structure(list(X1 = c(18007L, 145L, 17730L, 17325L, 18475L, 17699L,
16486L), score_phrase = c("Good", "Bad", "Great", "Great", "Great",
"Good", "Mediocre"), title = c("#IDARB", "007 Legends", "1001 Spikes",
"140", "1979 Revolution", "2014 FIFA World Cup Brazil", "3 Heroes -- Crystal Soul"
), url = c("/games/it-draws-a-red-box/xbox-one-20014945", "/games/007-legends/xbox-360-132394",
"/games/1001-spikes/wii-u-132248", "/games/140-game/pc-20007190",
"/games/1979-the-game/pc-115360", "/games/2014-fifa-world-cup/ps3-20012688",
"/games/3-heroes-crystal-soul/dsi-126064"), platform = c("Xbox One",
"Xbox 360", "Wii U", "PC", "PC", "PlayStation 3", "Nintendo DSi"
), score = c(7.5, 4.5, 8, 8, 8, 7.5, 5), genre = c("Party", "Action",
"Platformer", "Platformer", "Action, Adventure", "Sports", "Adventure"
), editors_choice = c("N", "N", "N", "N", "N", "N", "N"), release_year = c(2015L,
2012L, 2014L, 2013L, 2016L, 2014L, 2012L), release_month = c(1L,
10L, 6L, 10L, 4L, 4L, 1L), release_day = c(14L, 16L, 8L, 16L,
21L, 17L, 5L), Android = c("Android", "Android", "Android", "Android",
"Android", "Android", "Android"), Arcade = c("Arcade", "Arcade",
"Arcade", "Arcade", "Arcade", "Arcade", "Arcade"), iPad = c("iPad",
"iPad", "iPad", "iPad", "iPad", "iPad", "iPad"), iPhone = c("iPhone",
"iPhone", "iPhone", "iPhone", "iPhone", "iPhone", "iPhone"),
Linux = c("Linux", "Linux", "Linux", "Linux", "Linux", "Linux",
"Linux"), Macintosh = c("Macintosh", "Macintosh", "Macintosh",
"Macintosh", "Macintosh", "Macintosh", "Macintosh"), `New Nintendo 3DS` = c("New Nintendo 3DS",
"New Nintendo 3DS", "New Nintendo 3DS", "New Nintendo 3DS",
"New Nintendo 3DS", "New Nintendo 3DS", "New Nintendo 3DS"
), `Nintendo 3DS` = c("Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS",
"Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS"
), `Nintendo DS` = c("Nintendo DS", "Nintendo DS", "Nintendo DS",
"Nintendo DS", "Nintendo DS", "Nintendo DS", "Nintendo DS"
), `Nintendo DSi` = c("Nintendo DSi", "Nintendo DSi", "Nintendo DSi",
"Nintendo DSi", "Nintendo DSi", "Nintendo DSi", "Nintendo DSi"
), Ouya = c("Ouya", "Ouya", "Ouya", "Ouya", "Ouya", "Ouya",
"Ouya"), PC = c("PC", "PC", "PC", "PC", "PC", "PC", "PC"),
`PlayStation 3` = c("PlayStation 3", "PlayStation 3", "PlayStation 3",
"PlayStation 3", "PlayStation 3", "PlayStation 3", "PlayStation 3"
), `PlayStation 4` = c("PlayStation 4", "PlayStation 4",
"PlayStation 4", "PlayStation 4", "PlayStation 4", "PlayStation 4",
"PlayStation 4"), `PlayStation Portable` = c("PlayStation Portable",
"PlayStation Portable", "PlayStation Portable", "PlayStation Portable",
"PlayStation Portable", "PlayStation Portable", "PlayStation Portable"
), `PlayStation Vita` = c("PlayStation Vita", "PlayStation Vita",
"PlayStation Vita", "PlayStation Vita", "PlayStation Vita",
"PlayStation Vita", "PlayStation Vita"), SteamOS = c("SteamOS",
"SteamOS", "SteamOS", "SteamOS", "SteamOS", "SteamOS", "SteamOS"
), `Web Games` = c("Web Games", "Web Games", "Web Games",
"Web Games", "Web Games", "Web Games", "Web Games"), Wii = c("Wii",
"Wii", "Wii", "Wii", "Wii", "Wii", "Wii"), `Wii U` = c("Wii U",
"Wii U", "Wii U", "Wii U", "Wii U", "Wii U", "Wii U"), `Windows Phone` = c("Windows Phone",
"Windows Phone", "Windows Phone", "Windows Phone", "Windows Phone",
"Windows Phone", "Windows Phone"), `Windows Surface` = c("Windows Surface",
"Windows Surface", "Windows Surface", "Windows Surface",
"Windows Surface", "Windows Surface", "Windows Surface"),
`Xbox 360` = c("Xbox 360", "Xbox 360", "Xbox 360", "Xbox 360",
"Xbox 360", "Xbox 360", "Xbox 360"), `Xbox One` = c("Xbox One",
"Xbox One", "Xbox One", "Xbox One", "Xbox One", "Xbox One",
"Xbox One")), .Names = c("X1", "score_phrase", "title", "url",
"platform", "score", "genre", "editors_choice", "release_year",
"release_month", "release_day", "Android", "Arcade", "iPad",
"iPhone", "Linux", "Macintosh", "New Nintendo 3DS", "Nintendo 3DS",
"Nintendo DS", "Nintendo DSi", "Ouya", "PC", "PlayStation 3",
"PlayStation 4", "PlayStation Portable", "PlayStation Vita",
"SteamOS", "Web Games", "Wii", "Wii U", "Windows Phone", "Windows Surface",
"Xbox 360", "Xbox One"), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
我还检查了两个df中两个标题列的typeof,因为它们都是“character”
typeof(ign_temp$title)
[1] "character"
> typeof(ign_revised$title)
[1] "character"
@Gregor 但是,合并似乎仍然没有奏效。由于它是内连接,我也尝试用“标题”指定,但平台列在ign_revised中仍保持不变。有什么建议吗?
merge(ign_revised[1:11], ign_temp_wide, by = "title")
答案 0 :(得分:0)
我首先将您的ign_temp
数据框转换为宽格式,根据需要创建虚拟变量,然后加入ign_revised
数据。
使用此输入:
ign_temp = structure(list(title = c("LittleBigPlanet", "Splice", "NHL 13",
"NHL 13", "Wild", "Mark of the Ninja", "Mark of the Ninja"),
platform = c("Playstation Vita", "Playstation Vita", "Xbox",
"Android", "iPhone", "Xbox 360", "PC")), .Names = c("title",
"platform"), row.names = c(NA, -7L), class = c("data.frame"))
ign_temp$value = 1
ign_temp_wide = reshape2::dcast(title ~ platform, data = ign_temp,
value.var = "value", fill = 0)
ign_temp_wide
# title Android iPhone PC Playstation Vita Xbox Xbox 360
# 1 LittleBigPlanet 0 0 0 1 0 0
# 2 Mark of the Ninja 0 0 1 0 0 1
# 3 NHL 13 1 0 0 0 1 0
# 4 Splice 0 0 0 1 0 0
# 5 Wild 0 1 0 0 0 0
然后加入很简单。这应该有效:
merge(ign_revised[1:11], ign_temp_wide)
您只需要在ign_revised
的非平台列之间进行内部联接(我使用1:11
,因为您说平台从第12列开始)和整个ign_temp_wide
。 base::merge
有效,但您可以从How to join in R中选择自己喜欢的方法。如果您遇到连接问题,请确保两个数据框中的title
都是character
类列。我还假设两个数据框中的列名"title"
相同。