从网站GoogleSheets导入数据的正则表达式

时间:2019-03-10 17:19:03

标签: regex google-sheets google-sheets-importxml

目的是从title中提取tagswebpage

我正在使用 SessionFactory sessionFactory = new Configuration().configure("hibernate.cfg.xml").buildSessionFactory(); ,并且希望将结果全部放在1行中。像这样:

importdata

我被困在my process in googlesheet中途

  • 第一个标签[webpage] [title] [1st tag] [2nd tag] [3 rd tag] [4th tag] ... [last tag] -我已经从 大数据。

    Extracted
  • 第二个标签页=query({array_constrain(IMPORTDATA(A1),6375,10)},"WHERE (Col1 CONTAINS 'btn btn-secondary' AND Col1 CONTAINS 'href') or (Col1 CONTAINS 'meta property' AND Col1 CONTAINS 'og:title')")-提取了我需要的文本,但仅适用于第一行(仅提取了with REGEXEXTRACTtags仍然不存在,因为它分散了一些列...)

    =REGEXEXTRACT(query({array_constrain(IMPORTDATA(A1),6375,10)},"WHERE (Col1 CONTAINS 'btn btn-secondary' AND Col1 CONTAINS 'href')"),"\>(.+)\

我不知道该怎么做:(任何帮助都将受到赞赏!

1 个答案:

答案 0 :(得分:0)

=ARRAYFORMULA({REGEXREPLACE(TEXTJOIN(", ",1,
 QUERY(ARRAY_CONSTRAIN(SUBSTITUTE(IMPORTDATA(A2),"""",""),1000,15),
 "where Col1 contains '<meta property=og:title content='")),
 "<meta property=og:title content=| />",""),
 TRANSPOSE(REGEXEXTRACT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
 ARRAY_CONSTRAIN(SUBSTITUTE(IMPORTDATA(A2),"""",""),8000,3)),,50000)),
 "where Col1 contains '<a class=btn btn-secondary'"),"\>(.*)+\<"))})

0

demo spreadsheet