如何从逗号分隔的字符串中提取最后一个元素?

时间:2018-02-07 19:12:01

标签: scala apache-spark apache-spark-sql

使用此查询:

sql("SELECT _location, count(1) FROM tablaTemporal group by _location order by 2 desc" )

我收到了这个输出:

+--------------------------------+--------+
|_location                       |count(1)|
+--------------------------------+--------+
|London, United Kingdom          |15      |
|United States                   |12      |
|Bangalore, India                |8       |
|Hyderabad, India                |7       |
|Paris, France                   |6       |
|San Francisco, CA, United States|6       |
|Mountain View, CA, United States|4       |
|Pune, India                     |4       |
|Bengaluru, Karnataka, India     |3       |
+--------------------------------+--------+

但我需要的结果是:

+--------------------------------+--------+
|_location                       |count(1)|
+--------------------------------+--------+
|United States                   |22      |
|India                           |22      | 
|United Kingdom                  |15      |
|France                          |6       |
+--------------------------------+--------+

因此,我需要使用一些句子,如:

sql("SELECT SubstringOfLocationFromCharComma(_location), count(1) FROM tablaTemporal group by _location order by 2 desc" )

如何从逗号分隔的字符串中提取最后一个元素?

2 个答案:

答案 0 :(得分:2)

由于国家/地区的名称是逗号后面的最后一个元素,您还可以执行以下操作:

df.show(false)
+--------------------------------+
|a                               |
+--------------------------------+
|Mountain View, CA, United States|
|Pune, India                     |
|Bengaluru, Karnataka, India     |
+--------------------------------+


df.withColumn("a" , split($"a", ",") ).withColumn("a" , expr("a[ size(a) -1 ] ") ).show
+--------------+
|a             |
+--------------+
| United States|
| India        |
| India        |
+--------------+

然后会出现groupBy($"a").agg(sum($"count(1)").as("count"))以获得所需的结果。

答案 1 :(得分:0)

您可以使用import random import numpy as np from PIL import Image #----------- (Your Array of 500s, 200s and 0s)------------ a = np.random.randint(3, size=(500, 500)) a[a==2] = 500 a[a==1] = 200 # -------------------------------------------------------- # ----------- Code which you need to run ---------------------------- R, G, B = np.zeros(a.shape), np.zeros(a.shape), np.zeros(a.shape) R[a==200], G[a==200], B[a==200] = 255, 0, 0 R[a==500], G[a==500], B[a==500] = 0, 0, 0 R[a==0], G[a==0], B[a==0] = 255, 255, 255 R, G, B = Image.fromarray(R.astype('uint8'),mode=None), Image.fromarray(G.astype('uint8'),mode=None), Image.fromarray(B.astype('uint8'),mode=None) merged=Image.merge("RGB",(R,G,B)) merged.show() # ------------------------------------------------------------

regexp_extract