使用此查询:
sql("SELECT _location, count(1) FROM tablaTemporal group by _location order by 2 desc" )
我收到了这个输出:
+--------------------------------+--------+
|_location |count(1)|
+--------------------------------+--------+
|London, United Kingdom |15 |
|United States |12 |
|Bangalore, India |8 |
|Hyderabad, India |7 |
|Paris, France |6 |
|San Francisco, CA, United States|6 |
|Mountain View, CA, United States|4 |
|Pune, India |4 |
|Bengaluru, Karnataka, India |3 |
+--------------------------------+--------+
但我需要的结果是:
+--------------------------------+--------+
|_location |count(1)|
+--------------------------------+--------+
|United States |22 |
|India |22 |
|United Kingdom |15 |
|France |6 |
+--------------------------------+--------+
因此,我需要使用一些句子,如:
sql("SELECT SubstringOfLocationFromCharComma(_location), count(1) FROM tablaTemporal group by _location order by 2 desc" )
如何从逗号分隔的字符串中提取最后一个元素?
答案 0 :(得分:2)
由于国家/地区的名称是逗号后面的最后一个元素,您还可以执行以下操作:
df.show(false)
+--------------------------------+
|a |
+--------------------------------+
|Mountain View, CA, United States|
|Pune, India |
|Bengaluru, Karnataka, India |
+--------------------------------+
df.withColumn("a" , split($"a", ",") ).withColumn("a" , expr("a[ size(a) -1 ] ") ).show
+--------------+
|a |
+--------------+
| United States|
| India |
| India |
+--------------+
然后会出现groupBy($"a").agg(sum($"count(1)").as("count"))
以获得所需的结果。
答案 1 :(得分:0)
您可以使用import random
import numpy as np
from PIL import Image
#----------- (Your Array of 500s, 200s and 0s)------------
a = np.random.randint(3, size=(500, 500))
a[a==2] = 500
a[a==1] = 200
# --------------------------------------------------------
# ----------- Code which you need to run ----------------------------
R, G, B = np.zeros(a.shape), np.zeros(a.shape), np.zeros(a.shape)
R[a==200], G[a==200], B[a==200] = 255, 0, 0
R[a==500], G[a==500], B[a==500] = 0, 0, 0
R[a==0], G[a==0], B[a==0] = 255, 255, 255
R, G, B = Image.fromarray(R.astype('uint8'),mode=None), Image.fromarray(G.astype('uint8'),mode=None), Image.fromarray(B.astype('uint8'),mode=None)
merged=Image.merge("RGB",(R,G,B))
merged.show()
# ------------------------------------------------------------
regexp_extract