我有一张包含以下数据的表格:
mysql> describe Post;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | NULL | |
| post_date | datetime | NO | | NULL | |
| in_reply_to | int(11) | YES | | NULL | |
| text | varchar(160) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
mysql> select id as "Row ID", user_id as "User ID", post_date as "Post Date", IF(in_reply_to is NULL, "None", in_reply_to) as "In Reply To Post ID:", CONCAT(LEFT(text,40),"...") as "Post Text" from Post;
+--------+---------+---------------------+----------------------+---------------------------------------------+
| Row ID | User ID | Post Date | In Reply To Post ID: | Post Text |
+--------+---------+---------------------+----------------------+---------------------------------------------+
| 1 | 1 | 2015-08-14 20:38:00 | None | This is the original test post that I pu... |
| 2 | 2 | 2015-08-14 20:39:00 | None | This is the second post that I put into ... |
| 3 | 5 | 2015-08-14 22:00:00 | 1 | Hahaha, that post was hilarious. I canno... |
| 4 | 4 | 2015-08-14 23:00:00 | 1 | Today I saw a cat jump off the roof, ont... |
| 5 | 4 | 2015-08-14 23:00:00 | None | Today I saw a cat jump off the roof, ont... |
| 27 | 1 | 2015-09-08 05:53:40 | 2 | This is a mad reply ay... |
| 28 | 1 | 2015-09-08 11:24:05 | None | Yolo Swag... |
+--------+---------+---------------------+----------------------+---------------------------------------------+
7 rows in set (0.05 sec)
如果您不确定它们代表什么,每个列都有说明。我关注此问题的两列是id
和in_reply_to
。
in_reply_to
是一个NULLABLE
FK整数,在同一个表中引用id
;如果in_reply_to
是NULL
,则表示帖子是原始帖子,如果是整数值,则是回复帖子,代表帖子的ID是回复。
在下面的示例中,有4个原始帖子(1,2,5,28)和3个回复(3,4,27),即3是对1的回复,4也是对1的回复,和27是对2.的回复。我希望执行一个产生如下输出的SQL查询:
Num Replies
COUNT
是in_reply_to
表示同一个表中id
等于0
的行数的in_reply_to
;如果没有对该帖子的回复,则显示mysql> SELECT Post.id, Post.user_id, Post.post_date, Post.in_reply_to, CONCAT(LEFT(Post.text,40)), IF(counts.count IS NULL, 0, counts.count) AS 'Num of Replies' FROM Post LEFT JOIN (SELECT in_reply_to AS id, COUNT(*) AS count FROM Post WHERE in_reply_to IS NOT NULL GROUP BY in_reply_to) AS counts ON Post.id = counts.id;
+----+---------+---------------------+-------------+------------------------------------------+----------------+
| id | user_id | post_date | in_reply_to | CONCAT(LEFT(Post.text,40)) | Num of Replies |
+----+---------+---------------------+-------------+------------------------------------------+----------------+
| 1 | 1 | 2015-08-14 20:38:00 | NULL | This is the original test post that I pu | 2 |
| 2 | 2 | 2015-08-14 20:39:00 | NULL | This is the second post that I put into | 1 |
| 3 | 5 | 2015-08-14 22:00:00 | 1 | Hahaha, that post was hilarious. I canno | 0 |
| 4 | 4 | 2015-08-14 23:00:00 | 1 | Today I saw a cat jump off the roof, ont | 0 |
| 5 | 4 | 2015-08-14 23:00:00 | NULL | Today I saw a cat jump off the roof, ont | 0 |
| 27 | 1 | 2015-09-08 05:53:40 | 2 | This is a mad reply ay | 0 |
| 28 | 1 | 2015-09-08 11:24:05 | NULL | Random Text | 0 |
+----+---------+---------------------+-------------+------------------------------------------+----------------+
7 rows in set (0.00 sec)
(即没有行包含特定帖子的ID为$ tesseract captcha.tif output -psm 6
列。
感谢。
解决方案(根据Anders'回答):
def binarize_image_using_opencv(captcha_path, binary_image_path='input-black-n-white.jpg'):
im_gray = cv2.imread(captcha_path, cv2.CV_LOAD_IMAGE_GRAYSCALE)
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# although thresh is used below, gonna pick something suitable
im_bw = cv2.threshold(im_gray, thresh, 255, cv2.THRESH_BINARY)[1]
cv2.imwrite(binary_image_path, im_bw)
return binary_image_path
def preprocess_image_using_opencv(captcha_path):
bin_image_path = binarize_image_using_opencv(captcha_path)
im_bin = Image.open(bin_image_path)
basewidth = 300 # in pixels
wpercent = (basewidth/float(im_bin.size[0]))
hsize = int((float(im_bin.size[1])*float(wpercent)))
big = im_bin.resize((basewidth, hsize), Image.NEAREST)
# tesseract-ocr only works with TIF so save the bigger image in that format
tif_file = "input-NEAREST.tif"
big.save(tif_file)
return tif_file
def get_captcha_text_from_captcha_image(captcha_path):
# Preprocess the image befor OCR
tif_file = preprocess_image_using_opencv(captcha_path)
# Perform OCR using tesseract-ocr library
# OCR : Optical Character Recognition
image = Image.open(tif_file)
ocr_text = image_to_string(image, config="-psm 6")
alphanumeric_text = ''.join(e for e in ocr_text)
return alphanumeric_text
答案 0 :(得分:1)
您需要在同一个表上加入两个查询。第一个只选择所有帖子,第二个计算每个帖子的回复数量。这是一个左连接,因为您想要包含没有任何回复的帖子(不会从第二个查询返回)。 IF
用于将NULL
值转换为0
。
SELECT
post.id,
-- Other fields...,
IF(counts.count IS NULL, 0, counts.count) AS count
FROM post
LEFT JOIN
(SELECT
in_reply_to AS id,
COUNT(*) AS count
FROM post
WHERE in_reply_to IS NOT NULL
GROUP BY in_reply_to) AS counts
ON post.id = counts.id
Disclaimar:我没有测试过这个。
答案 1 :(得分:0)
您可以使用传统方式进行连接,也可以直接在新列中进行连接。
示例:
select a.id,
(select count(*) from (select 1 as id union all select 1 union all select 2)b where b.id=a.id) as count_of_replies
from
(select 1 as id union all select 1 union all select 2)a
请注意,2个子查询“表”都是同一个表。