我在AWS Athena中有此表
+----------------------------------------------------------------------------+
| URL |
+----------------------------------------------------------------------------+
| stag.v1.abc.in/beauty/hair/go-abc-girl-a57-20200001?ref=home_feed_1 |
| stag.v1.abc.in/ |
| stag.v1.abc.ph/eatdrink/cheap/76027/dairy-free-upsize-a1046-20190515?ref=ar|
| stag.v1.abc.in/beauty/hair/go-abc-girl-a57-20200003?ref=home_feed_1 |
+-----------------------------------------------------------------------------+
我需要从两个定界符之间的列中提取字符串的部分(id)(在最后一个“-”之后和“?”之前) 我应该得到
+------------------------+
| ID |
+------------------------+
| 20200001 |
| - |
| 20190515 |
| 20200003 |
+------------------------+
我尝试了SUBSTRING_INDEX()但雅典娜不支持它。有人可以帮我吗?预先感谢
答案 0 :(得分:1)
url_extract_path
+ regexp_extract
select regexp_extract(url_extract_path(url),'([^-]*)$') from "tableabc"
limit 5;