如何在BigQuery(或迷你图或颜色渐变)中构建“星级”报告

时间:2018-02-23 16:02:32

标签: google-bigquery

假设我有以下示例输入:

WITH Ratings AS (
    (SELECT 'A' name, 2 score) UNION ALL
    (SELECT 'B' name, 0 score) UNION ALL
    (SELECT 'C' name, 5 score) UNION ALL
    (SELECT 'D' name, 1 score))

score在0到5之间的数字。 如何生成显示姓名和相应星数的报告?

5 个答案:

答案 0 :(得分:12)

我们可以使用两个Unicode字符将星级评分建立为字符串:

★ - Unicode code point 9733 
☆ - Unicode code point 9734

我们可以使用CODE_POINTS_TO_STRING函数来构建星星,REPEAT函数可以生成正确数量的星星

将样本输入的解决方案组合在一起:

WITH Ratings AS (
(SELECT 'A' name, 2 score) UNION ALL
(SELECT 'B' name, 0 score) UNION ALL
(SELECT 'C' name, 5 score) UNION ALL
(SELECT 'D' name, 1 score))

SELECT 
  name, 
  CONCAT(
    REPEAT(CODE_POINTS_TO_STRING([9733]), score),
    REPEAT(CODE_POINTS_TO_STRING([9734]), 5-score)) score
FROM Ratings

它将产生以下结果:

name    score
A       ★★☆☆☆
B       ☆☆☆☆☆
C       ★★★★★
D       ★☆☆☆☆

答案 1 :(得分:5)

我的条目是一个颜色渐变,因为迷你图只对某些字体看起来很好 - 而且这不是BigQuery Web UI使用的字体。

在一天中,Stack Overflow是每个标记最活跃的时间:

#standardSQL
CREATE TEMP FUNCTION barchart(v ARRAY<FLOAT64>, mm STRUCT<min FLOAT64, max FLOAT64>) AS ((
    SELECT STRING_AGG(SUBSTR('', 1+CAST(ROUND(y) AS INT64), 1), '') 
    FROM (SELECT IFNULL(SAFE_DIVIDE((e-mm.min),(mm.max-mm.min))*4, 0) y FROM UNNEST(v) e))); 
CREATE TEMP FUNCTION vbar(v ARRAY<FLOAT64>) AS ( 
  barchart(v, (SELECT AS STRUCT MIN(a), MAX(a) FROM UNNEST(v) a)) 
);


WITH top_tags AS (
 (SELECT x.value FROM (SELECT APPROX_TOP_COUNT(tag, 24) x FROM `bigquery-public-data.stackoverflow.posts_questions`, UNNEST(SPLIT(tags,'|')) tag WHERE EXTRACT(YEAR FROM creation_date)>=2016), UNNEST(x) x)
)

SELECT tag, vbar(ARRAY_AGG(1.0*hhh.count ORDER BY hhh.value)) gradient, SUM(hhh.count)  c
FROM (
  SELECT tag, APPROX_TOP_COUNT(EXTRACT(HOUR FROM creation_date), 24) h_h
  FROM `bigquery-public-data.stackoverflow.posts_questions`, UNNEST(SPLIT(tags,'|')) tag
  WHERE tag IN (SELECT * FROM top_tags) AND EXTRACT(YEAR FROM creation_date)>=2016
  GROUP BY 1
), UNNEST(h_h) hhh
GROUP BY tag
ORDER BY STRPOS(gradient, '')



Row gradient                                                c       tag  
1       317538  android  
2       59445   asp.net  
3       159134  ios  
4       111988  angularjs    
5       212843  jquery   
6       138143  mysql    
7       107586  swift    
8       318294  php  
9       84723   json     
10      233100  html     
11      390245  java     
12      83787   angular  
13      70150   sql-server   
14      534663  javascript   
15      291541  c#   
16      65668   c    
17      111792  sql  
18      158999  css  
19      88146   arrays   
20      61840   ruby-on-rails    
21      136265  c++  
22      104218  node.js  
23      360396  python   
24      98690   r   

enter image description here

更简洁的阴影渐变,但只有3个值:

#standardSQL
CREATE TEMP FUNCTION barchart(v ARRAY<FLOAT64>, mm STRUCT<min FLOAT64, max FLOAT64>) AS ((
    SELECT STRING_AGG(SUBSTR('▓▒░', 1+CAST(ROUND(y) AS INT64), 1), '') 
    FROM (SELECT IFNULL(SAFE_DIVIDE((e-mm.min),(mm.max-mm.min))*2, 0) y FROM UNNEST(v) e))); 
CREATE TEMP FUNCTION vbar(v ARRAY<FLOAT64>) AS ( 
  barchart(v, (SELECT AS STRUCT MIN(a), MAX(a) FROM UNNEST(v) a)) 
);



WITH top_countries AS (
 (SELECT x.value FROM (SELECT APPROX_TOP_COUNT(country_code, 12) x FROM `ghtorrent-bq.ght_2017_09_01.users`), UNNEST(x) x)
)

SELECT vbar(ARRAY_AGG(1.0*hhh.count ORDER BY hhh.value)) gradient, SUM(hhh.count) c, country_code
FROM (
  SELECT country_code, APPROX_TOP_COUNT(EXTRACT(HOUR FROM a.created_at), 24) h_h
  FROM `githubarchive.year.2017` a
  JOIN `ghtorrent-bq.ght_2017_09_01.users` b
  ON a.actor.login=b.login
  WHERE country_code IN (SELECT * FROM top_countries) 
  AND actor.login NOT IN (SELECT value FROM (SELECT APPROX_TOP_COUNT(actor.login, 1000) x FROM `githubarchive.year.2017` WHERE type='WatchEvent'), UNNEST(x))
  AND a.type='WatchEvent'
  GROUP BY 1
), UNNEST(h_h) hhh
GROUP BY country_code 
ORDER BY STRPOS(gradient, '░')

Row gradient                    c       country_code     
1   ░░░░░░░▒▒▒▒▒▒▒▒▓▓▓▓▓▓▒▒░    204023  au   
2   ▒░░░░░░░░░▒▒▒▒▒▒▒▓▓▓▓▓▓▒    293589  jp   
3   ▓▒░░▒▒░░░░▒▒▒▒▒▒▒▓▓▓▓▓▓▓    2125724 cn   
4   ▓▓▓▒▒░░░░░░░░▒▒▒▒▒▒▒▒▓▓▓    447092  in   
5   ▓▓▓▓▓▓▒▒░░░░░░░░▒▒▒▒▒▒▒▓    381510  ru   
6   ▓▓▓▓▓▓▒▒░░░░░░░░▒▒▒▒▒▒▒▒    545906  de   
7   ▓▓▓▓▓▓▓▒░░░▒░░░░▒▒▒▒▒▒▒▒    395949  fr   
8   ▓▓▓▓▓▓▓▒▒░░░░░░░░▒▒▒▒▒▒▒    491068  gb   
9   ▒▒▒▒▓▓▓▓▓▓▓▒░░░▒░░░░░▒▒▒    419608  br   
10  ▒▒▒▒▒▒▒▓▓▓▓▓▓▒▒░░░░░░░░▒    2443381 us   
11  ▒▒▒▒▒▒▒▓▓▓▓▓▓▒▒░░░░░░░▒▒    294793  ca   

迷你线的简短代码 - 适用于Data Studio:

#standardSQL
CREATE TEMP FUNCTION barchart(v ARRAY<FLOAT64>, mm STRUCT<min FLOAT64, max FLOAT64>) AS ((
    SELECT STRING_AGG(SUBSTR('▁▂▃▄▅▆▇█', 1+CAST(ROUND(y) AS INT64), 1), '') 
    FROM (SELECT IFNULL(SAFE_DIVIDE((e-mm.min),(mm.max-mm.min))*7, 0) y FROM UNNEST(v) e))); 
CREATE TEMP FUNCTION vbar(v ARRAY<FLOAT64>) AS ( 
  barchart(v, (SELECT AS STRUCT MIN(a), MAX(a) FROM UNNEST(v) a)) 
);

答案 2 :(得分:3)

添加更多通用选项以生成时间序列/迷你图类型的报告

#standardSQL
CREATE TEMP FUNCTION sparklines(arr ARRAY<INT64>) AS ((
  SELECT STRING_AGG(CODE_POINTS_TO_STRING([code]), '') 
  FROM UNNEST(arr) el, 
  UNNEST([(SELECT MAX(el) FROM UNNEST(arr) el)]) mx, 
  UNNEST([(SELECT MIN(el) FROM UNNEST(arr) el)]) mn
  JOIN UNNEST([9602, 9603, 9605, 9606, 9607]) code WITH OFFSET pos
  ON pos = CAST(IF(mx = mn, 1, (el - mn) / (mx - mn)) * 4 AS INT64) 
)); 
WITH series AS (
  SELECT 1 id, [3453564, 5343333, 2876345, 3465234] arr UNION ALL
  SELECT 2, [5743231, 3276438, 1645738, 2453657] UNION ALL
  SELECT 3, [1,2,3,4,5,6,7,8,9,0] UNION ALL
  SELECT 4, [3245876, 2342879, 5876324, 7342564]  
)  
SELECT 
  id, TO_JSON_STRING(arr) arr, sparklines(arr) sparklines 
FROM series 

结果如下

Row id  arr                                 sparklines   
1   1   [3453564,5343333,2876345,3465234]   ▃▇▂▃     
2   2   [5743231,3276438,1645738,2453657]   ▇▅▂▃     
3   3   [1,2,3,4,5,6,7,8,9,0]               ▂▃▃▅▅▆▆▇▇▂   
4   4   [3245876,2342879,5876324,7342564]   ▃▂▆▇       

添加Mosha的版本(取自他下面的评论)

#standardSQL
CREATE TEMP FUNCTION barchart(v ARRAY<FLOAT64>, MIN FLOAT64, MAX FLOAT64) AS ( 
  IF(
    MIN = MAX, 
    REPEAT(CODE_POINTS_TO_STRING([9603]), ARRAY_LENGTH(v)), 
    (
    SELECT STRING_AGG(CODE_POINTS_TO_STRING([9601 + CAST(ROUND(y) AS INT64)]), '') 
    FROM ( 
      SELECT SAFE_DIVIDE(e-min, MAX - MIN) * 7 y 
      FROM UNNEST(v) e)
    )
  )
); 
CREATE TEMP FUNCTION vbar(v ARRAY<FLOAT64>) AS ( 
  barchart(v, (SELECT MIN(a) FROM UNNEST(v) a), (SELECT MAX(a) FROM UNNEST(v) a)) 
);
WITH numbers AS (
  SELECT 1 id, [3453564., 5343333., 2876345., 3465234.] arr UNION ALL
  SELECT 2, [5743231., 3276438., 1645738., 2453657.] UNION ALL
  SELECT 3, [1.,2,3,4,5,6,7,8,9,0] UNION ALL
  SELECT 4, [3245876., 2342879, 5876324, 7342564]  
)  
SELECT 
  id, TO_JSON_STRING(arr) arr, vbar(arr) sparklines 
FROM numbers  

如果应用于与上述版本相同的虚拟数据 - 在

下面生成
Row id  arr                                 sparklines   
1   1   [3453564,5343333,2876345,3465234]   ▃█▁▃     
2   2   [5743231,3276438,1645738,2453657]   █▄▁▂     
3   3   [1,2,3,4,5,6,7,8,9,0]               ▂▃▃▄▅▆▆▇█▁   
4   4   [3245876,2342879,5876324,7342564]   ▂▁▆█      

答案 3 :(得分:2)

这里更加疯狂 完全没用 - 但玩的很有趣

应用本文中提供的所有不同选项进行图像处理和绘图(使用这些内容的个人资料图片)+一些新的

使用费利佩Color Gradient方法制作的第一和第二个结果(对于费利佩的图片)使用不同的缩放选项

第三个结果 - 使用Felipe的Shaded Gradient方法

第四个结果 - 使用米哈伊尔(我的)/ Mosha的Spark-line方法

最后的第5和第6个结果 - 分别使用代表ASCII Shades of Gray的ASCII字符集:
    短集 - &#34; .:-=+*#%@&#34;
    完整(长)集 - &#34; $@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,"^``'.&#34;

代码是微不足道的,字面上与各自的答案相同 - 唯一的区别是上面练习中使用的数据是使用HTML canvas getImageData() Method简单获取的图像像素数据 - 显然不在BigQuery之外 - 只是简单的html页面

选择在这里变得疯狂并享受玩图像转换/处理的乐趣 - 无限!但在学习范围之外可能无用

答案 4 :(得分:0)

将垂直条形图拟合为单个字符具有挑战性,因为我们只能使用8种不同的高度。但是水平条形图没有这个限制,我们可以按任意长度缩放水平图表。以下示例使用30,并以水平条形图显示每周的出生数。数据基于公共数据集:

create temp function hbar(value int64, max int64) as (
  repeat('█', cast(30 * value / max as int64))
);
select 
  ['sunday', 'monday', 'tuesday', 'wednesday',
   'thursday', 'friday', 'saturday'][ordinal(wday)] wday, bar from (
select wday, hbar(count(*), max(count(*)) over()) bar
from `bigquery-public-data.samples.natality`
where wday is not null
group by 1
order by 1 asc)

结果

wday      bar
---------------------------------------------
sunday    ███████████████████
monday    ███████████████████████████
tuesday   ██████████████████████████████
wednesday ██████████████████████████████
thursday  █████████████████████████████
friday    █████████████████████████████
saturday  █████████████████████