我不知道MySQL在内部如何工作,但是我确定索引或表计数的元信息有问题:
mysql> select count(*) from Event;
+----------+
| count(*) |
+----------+
| 5925 |
+----------+
1 row in set (0,01 sec)
mysql> select count(*) from Event where event_id in (select discount_event_id from Discount);
+----------+
| count(*) |
+----------+
| 5901 |
+----------+
1 row in set (0,12 sec)
mysql> select count(*) from Event where event_id not in (select discount_event_id from Discount);
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0,11 sec)
这些24
丢失的event_id
没有任何意义。从我的角度来看,从逻辑上讲这是不可能的。不能有24
行既在另一个集中又不在另一个集中。或者是,或者不是。
另外,正如一些答案和评论所建议的那样,没有NULL
event_id
,因为它们是 rowids :
mysql> select count(*) from Event where event_id is null;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
这是怎么回事?
答案 0 :(得分:4)
这意味着library(RSelenium)
library(rvest)
rD <- rsDriver()
remDr <- rD[["client"]]
remDr$navigate("https://finance.yahoo.com/quote/AMZN/history?period1=1388559600&period2=1548918000&interval=1d&filter=history&frequency=1d")
for(i in 1:5){
remDr$executeScript(paste("scroll(0,", i * 10000,");"))
Sys.sleep(3)
}
page_source <- remDr$getPageSource()
out <- read_html(page_source[[1]]) %>% html_nodes("table") %>% html_table()
nrow(out[[1]])
# [1] 801
中有24个event_id
NULL
select count(*) from Event where event_id IS NULL
和in
运算符在与not in
值进行比较时都返回NULL
,该值被强制为NULL
,因此在两个结果集中都被省略。 / p>