我希望在dplyr中高效使用子查询,例如structure。例如,我们进行
之类的子查询select * from table where tbl_id in (select tbl_id from table1 where name='a');
在这里,我认为我们不会保存内部查询的任何结果。我想在dplyr链中使用类似的结构。
我试图通过将一个dplyr链的结果放在方括号中来使用它,但这种方式无法正常工作。我已经知道我们可以将其保存为临时df并使用它,但是我不想保存它。
下面是两个表数据/数据框:
# Libraries
library(sqldf)
#> Loading required package: gsubfn
#> Loading required package: proto
#> Loading required package: RSQLite
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Q 16 Write a query in SQL to find the name of those
# movies where one or more actors acted in two or more movies
movie <- read.csv("q14_movie.csv")
movie_cast <- read.csv("q16_movie_cast.csv")
print(movie)
#> mov_id mov_title mov_year mov_time mov_lang
#> 1 901 Vertigo 1958 128 English
#> 2 902 The Innocents 1961 100 English
#> 3 903 Lawrence of Arabia 1962 216 English
#> 4 904 The Deer Hunter 1978 183 English
#> 5 905 Amadeus 1984 160 English
#> 6 906 Blade Runner 1982 117 English
#> 7 907 Eyes Wide Shut 1999 159 English
#> 8 908 The Usual Suspects 1995 106 English
#> 9 909 Chinatown 1974 130 English
#> 10 910 Boogie Nights 1997 155 English
#> 11 911 Annie Hall 1977 93 English
#> 12 912 Princess Mononoke 1997 134 Japanese
#> 13 913 The Shawshank Redemption 1994 142 English
#> 14 914 American Beauty 1999 122 English
#> 15 915 Titanic 1997 194 English
#> 16 916 Good Will Hunting 1997 126 English
#> 17 917 Deliverance 1972 109 English
#> 18 918 Trainspotting 1996 94 English
#> 19 919 The Prestige 2006 130 English
#> 20 920 Donnie Darko 2001 113 English
#> 21 921 Slumdog Millionaire 2008 120 English
#> 22 922 Aliens 1986 137 English
#> 23 923 Beyond the Sea 2004 118 English
#> 24 924 Avatar 2009 162 English
#> 25 926 Seven Samurai 1954 207 Japanese
#> 26 927 Spirited Away 2001 125 Japanese
#> 27 928 Back to the Future 1985 116 English
#> 28 925 Braveheart 1995 178 English
#> mov_dt_rel mov_rel_country
#> 1 1958-08-24 UK
#> 2 1962-02-19 SW
#> 3 1962-12-11 UK
#> 4 1979-03-08 UK
#> 5 1985-01-07 UK
#> 6 1982-09-09 UK
#> 7 UK
#> 8 1995-08-25 UK
#> 9 1974-08-09 UK
#> 10 1998-02-16 UK
#> 11 1977-04-20 USA
#> 12 2001-10-19 UK
#> 13 1995-02-17 UK
#> 14 UK
#> 15 1998-01-23 UK
#> 16 1998-06-03 UK
#> 17 1982-10-05 UK
#> 18 1996-02-23 UK
#> 19 2006-11-10 UK
#> 20 UK
#> 21 2009-01-09 UK
#> 22 1986-08-29 UK
#> 23 2004-11-26 UK
#> 24 2009-12-17 UK
#> 25 1954-04-26 JP
#> 26 2003-09-12 UK
#> 27 1985-12-04 UK
#> 28 1995-09-08 UK
print(movie_cast)
#> act_id mov_id role
#> 1 101 901 John Scottie Ferguson
#> 2 102 902 Miss Giddens
#> 3 103 903 T.E. Lawrence
#> 4 104 904 Michael
#> 5 105 905 Antonio Salieri
#> 6 106 906 Rick Deckard
#> 7 107 907 Alice Harford
#> 8 108 908 McManus
#> 9 110 910 Eddie Adams
#> 10 111 911 Alvy Singer
#> 11 112 912 San
#> 12 113 913 Andy Dufresne
#> 13 114 914 Lester Burnham
#> 14 115 915 Rose DeWitt Bukater
#> 15 116 916 Sean Maguire
#> 16 117 917 Ed
#> 17 118 918 Renton
#> 18 120 920 Elizabeth Darko
#> 19 121 921 Older Jamal
#> 20 122 922 Ripley
#> 21 114 923 Bobby Darin
#> 22 109 909 J.J. Gittes
#> 23 119 919 Alfred Borden
sqldf('select * from movie m join movie_cast mc on
m.mov_id=mc.mov_id where mc.act_id in
(select act_id from movie_cast group by act_id having count(mov_id)>1)')
#> mov_id mov_title mov_year mov_time mov_lang mov_dt_rel
#> 1 914 American Beauty 1999 122 English
#> 2 923 Beyond the Sea 2004 118 English 2004-11-26
#> mov_rel_country act_id mov_id role
#> 1 UK 114 914 Lester Burnham
#> 2 UK 114 923 Bobby Darin
tmp <- movie_cast %>%
group_by(act_id) %>%
summarise(num_movies=n_distinct(mov_id)) %>%
filter(num_movies>1)
inner_join(movie,movie_cast,by='mov_id') %>% filter(act_id %in% tmp$act_id)
#> mov_id mov_title mov_year mov_time mov_lang mov_dt_rel
#> 1 914 American Beauty 1999 122 English
#> 2 923 Beyond the Sea 2004 118 English 2004-11-26
#> mov_rel_country act_id role
#> 1 UK 114 Lester Burnham
#> 2 UK 114 Bobby Darin
inner_join(movie,movie_cast,by='mov_id') %>%
filter(act_id %in% (movie_cast %>%
group_by(act_id) %>%
summarise(num_movies=n_distinct(mov_id)) %>%
filter(num_movies>1) %>%
select(act_id)))
#> [1] mov_id mov_title mov_year mov_time
#> [5] mov_lang mov_dt_rel mov_rel_country act_id
#> [9] role
#> <0 rows> (or 0-length row.names)
我希望在不另存为tmp的情况下获得相同的结果,如代码中所述!
谢谢