如何对值中包含NA的变量进行子集化?

时间:2018-02-22 06:40:17

标签: r na

我有一个imdb数据集,我想替换预算和box_office_gross的缺失值,我认为使用多重插补将是替换缺失值的方法。

为了将数字列与整个数据集分开并执行插补,我尝试对变量进行子集化

> NBCU_Limited <- subset(NBCU_dataLaurel_Modified, select = c(NBCU_dataLaurel_Modified$imdb_votes, NBCU_dataLaurel_Modified$runtime_min, NBCU_dataLaurel_Modified$Budget, NBCU_dataLaurel_Modified$Box_Office_Gross))
Error: NA column indexes not supported

但是我得到一个错误,因为变量中有NA值,我不能否定其余的字符列,因为即使它们有NA&#39; s也会得到相同的错误。

如何只将这四个变量放入新的数据框中,以便我可以对它们进行多次插补。

Sample Dataset

更新:导致错误,因为我在子集中单独指定data.frame,如果我没有指定data.frame并且只指定变量的名称我没有得到这个错误。我不确定为什么,但这就是导致错误的原因,所以这可能是因为我的代码不正确。

以下是数据,

> dput(Sample)
structure(list(imdbid = c("tt6256056", "tt0085450", "tt5050772", 
"tt5069876", "tt0083791", "tt0083929"), title = c("Una Famiglia", 
"Doctor Detroit", "Honeytrap", "Maniac 8.2.8", "The Dark Crystal", 
"Fast Times at Ridgemont High"), plot = c("N/A", "A timid college professor, conned into posing as a flamboyant pimp, finds himself enjoying his new occupation on the streets.", 
"Simeon's evening goes horribly wrong when a young woman tries to pick him up.", 
"Maniac: a person afflicted with mania. Mania: A manifestation of bipolar disorder, characterized by profuse and rapidly changing ideas, exaggerated sexuality, gaiety, or irritability, decreased sleep and violent abnormal behavior.", 
"On another planet in the distant past, a Gelfling embarks on a quest to find the missing shard of a magical crystal, and so restore order to his world.", 
"A group of Southern California high school students are enjoying their most important subjects: sex, drugs and rock n' roll."
), rating = c("N/A", "R", "N/A", "N/A", "PG", "R"), imdb_rating = c(NA, 
5.1, NA, NA, 7.2, 7.2), metacritic = c(NA, NA, NA, NA, NA, 67
), dvd_release = structure(c(NA, 1126569600, NA, NA, 939081600, 
1099353600), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    production = c("N/A", "Universal", "Array Releasing", "N/A", 
    "Sony Pictures Home Entertainment", "Universal Pictures"), 
    actors = c("Patrick Bruel, Fortunato Cerlino, Matilda De Angelis, Ennio Fantastichini", 
    "Dan Aykroyd, Howard Hesseman, Donna Dixon, Lydia Lei", "Jennifer Nelson, Daemian Greaves, Polina Vasileva, Becki Lloyd", 
    "Dimitra Aggelou, Giorgos Efthimiou, Stavroula Kontopoulou, Maria-Antouanetta Tatsi", 
    "Jim Henson, Kathryn Mullen, Frank Oz, Dave Goelz", "Sean Penn, Jennifer Jason Leigh, Judge Reinhold, Robert Romanus"
    ), imdb_votes = c(NA, 4492, NA, NA, 44862, 76980), poster = c("N/A", 
    "https://images-na.ssl-images-amazon.com/images/M/MV5BMjhjY2Q4NWEtYTUzZC00YjE2LTk0ZjktNzUyZjIwNmQ0YTkyXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_SX300.jpg", 
    "N/A", "https://images-na.ssl-images-amazon.com/images/M/MV5BZjdmZTRhYzgtOGY4MS00OGM5LWJlNmItYzJiYjZiNmVmYjhkXkEyXkFqcGdeQXVyNDA2NjM2ODk@._V1_SX300.jpg", 
    "https://images-na.ssl-images-amazon.com/images/M/MV5BMWZlZjk1MGEtYWMzOC00N2EyLWFkOTUtZDM4NGNlY2M0YjVmXkEyXkFqcGdeQXVyNTAyODkwOQ@@._V1_SX300.jpg", 
    "https://images-na.ssl-images-amazon.com/images/M/MV5BYzBlZjE1MDctYjZmZC00ZTJmLWFkOWEtYjdmZDZkODBkZmI2XkEyXkFqcGdeQXVyNjQ2MjQ5NzM@._V1_SX300.jpg"
    ), director = c("Sebastiano Riso", "Michael Pressman", "Nick Archer", 
    "Giorgos Efthimiou", "Jim Henson, Frank Oz", "Amy Heckerling"
    ), release_date = structure(c(1493596800, 421027200, 1448928000, 
    1431734400, 408931200, 398044800), class = c("POSIXct", "POSIXt"
    ), tzone = "UTC"), Year = c(2017, 1983, 2015, 2015, 1982, 
    1982), Year_Groups = c("2010-2020", "1980-1989", "2010-2020", 
    "2010-2020", "1980-1989", "1980-1989"), Month = c("May", 
    "May", "December", "May", "December", "August"), runtime_min = c(97, 
    89, NA, 15, 93, 90), genre = c("Drama", "Comedy", "Short, Thriller", 
    "Short, Horror", "Adventure, Family, Fantasy", "Comedy, Drama"
    ), awards = c("N/A", "N/A", "N/A", "1 win.", "Nominated for 1 BAFTA Film Award. Another 2 wins & 4 nominations.", 
    "1 win & 1 nomination."), keywords = c(NA, "pimp|college-professor|voyeurism|voyeur|blue-panties|panties|red-dress|blonde|female-frontal-nudity|female-nudity|nude-girl|nude|bare-breasts|breasts|topless-female-nudity|scantily-clad-female|cleavage|two-word-title|reference-to-joe-frazier|reference-to-yul-brynner|mother-son-relationship|f-word|place-name-in-title|city-name-in-title|dual-identity|prostitution|independent-film|title-spoken-by-character|character-name-in-title", 
    NA, NA, "mystic|magical-crystal|crystal-shard|sword-and-sorcery|puppetry|crystal|shard|quest|evil|monster|feeding-on-energy|hidden-entrance|giant-crystal|actor-voicing-multiple-characters|planetary-alignment|reunification|three-word-title|dark-fantasy|slow-motion-scene|vampire|surrealism|christ-allegory|cult-film|sorceress|relic|race-against-time|muppet|mission|magic|kingdom|creature|good-versus-evil|directed-by-star|epic|multiple-monsters|invented-language|slavery|orrery|puppet|mutation|darkness|destiny", 
    "high-school|title-directed-by-female|females-talking-about-sex|unwanted-pregnancy|fired-from-the-job|teacher-student-relationship|irreverence|sexual-awakening|innocence-lost|ensemble-film|coming-of-age|teen-movie|high-school-teacher|advice|ticket-scalping|shopping-mall|loss-of-virginity|female-nudity|brother-sister-relationship|caught-masturbating|california|surfer|teacher|break-up|rock-'n'-roll|virgin|teenager|friendship|drugs|date|surfer-dude|blond-boy|redheaded-boy|generation-x|f-rated|vomiting|sex-scene|cult-film|breasts|jeans|hawaiian-shirt|payphone|teenage-girl|teen-sex-comedy|scantily-clad-female|reference-to-led-zeppelin|dream-girl|underage-girl|jailbait|trophy-wife|voyeur|sexual-promiscuity|sexual-desire|sexual-attraction|lust|sex-on-couch|female-rear-nudity|female-frontal-nudity|panties|cheerleader-uniform|female-removes-her-clothes|cleavage|marijuana|drug-use|teen-angst|surfing|school-life|pregnancy|masturbation|football-player|first-love|employment|bikini|stoner|rock-m... <truncated>
    ), Budget = c(NA, 10375893, NA, NA, 1.5e+07, 4500000), Box_Office_Gross = c(2.48, 
    70, 70, 124, 140, 140)), .Names = c("imdbid", "title", "plot", 
"rating", "imdb_rating", "metacritic", "dvd_release", "production", 
"actors", "imdb_votes", "poster", "director", "release_date", 
"Year", "Year_Groups", "Month", "runtime_min", "genre", "awards", 
"keywords", "Budget", "Box_Office_Gross"), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

1 个答案:

答案 0 :(得分:0)

导致错误,因为我在子集中单独指定data.frame,如果我没有指定data.frame并且只指定变量的名称我没有收到此错误。我不确定为什么,但这是导致错误的原因,所以这可能是因为我的代码不正确。谢谢@Tung指出这一点。