我有一个带有URL列表的表
url
http://03cubsml.baseball.cbssports.com/stats/stats-main?selectedplayer=2122997
http://08flb.baseball.cbssports.com/scoring/standard
http://100-poems.com/poems/life/index2.htm
http://10000lakesrbl.baseball.cbssports.com/stats/stats-main
http://1000pictures.com/view.htm?cscenic/sunset+fnoy-2011-07-21-211010+a1112212325323435434553545885949hh9
http://05command.wikidot.com/tech-hub-tag-list
http://10000lakesrbl.baseball.cbssports.com/players/playerpage/2504134
http://1001goroskop.ru/gadanie/?kniga-sudeb
http://04spfbl.baseball.cbssports.com/standings/overall
http://05command.wikidot.com
http://05command.wikidot.com/tech-hub-tag
http://05fbl.baseball.cbssports.com/stats/stats-main
http://100-poems.com/poems/life/0464004.htm
http://10000islands.proboards.com/board/129/tito-headquarters
http://10000islands.proboards.com/thread/11959/tip-islands-party?page=477
http://10000islands.proboards.com/thread/14172/illustrious-house-improving-wordiness?page=82
http://1000pictures.com/view.htm?cscenic/sunset+feilat05-040+a1112212325323435434553545885949hh9
http://1001-rimes.com/listeperson.php?letter=%E9&start=30
http://1001-rimes.com/listeperson.php?letter=ques&start=30
http://1001goroskop.ru/?god
我现在使用以下代码将URL拆分为URL中存在的单词列表
Create table url_keyword
(url string,
keywords Array<String>);
Insert Overwrite table url_keyword
as
Select url,split(lcase (parse_url (url,'PATH')),"[=/_%:|^$#@!&,?*_~+.`<>(){}' \-\;\" \\ \\[\\]{[0 -9]+ }]") AS keywords from url_table;
我得到的输出具有通过拆分数组生成的url和关键字(空格分隔的数组)。现在,我想获取每个网址生成的字数,但是每当我尝试执行
时,regexp_replace(keywords,' ',',')
将其转换为逗号分隔的数组,以便我可以使用长度函数来获取字数,但会出现错误
Wrong arguments '','': No matching method for class org.apache.hadoop.hive.ql.udf.UDFRegExpReplace with (array, string, string). Possible choices: _FUNC_(string, string, string)
在这种情况下如何实现字数统计?
我的关键字输出看起来像
stats stats main
scoring standard
poems life index htm
stats stats main
view htm
tech hub tag list
players playerpage
gadanie
standings overall
tech hub tag
stats stats main
poems life htm
board tito headquarters
thread tip islands party
thread illustrious house improving wordiness
view htm
listeperson php
listeperson php