Question

上下文：
到现在为止，我一直使用sql中的regexp提取变量网址。我发现它非常慢，并且想要使用substr和instr命令对其进行优化。这对我来说很重要，因为我是sql的新手，这使我对这些命令更加熟悉。

数据库：我的数据库是由从社交平台提取的帖子制作的。文字称为“ titre”。它包含不同格式的变量url：www，http，https。我想创建一个包含这些URL和相关id_post的表或表视图（我不固定）。

我的工作：我注意到url总是以空格结尾，例如：“ toto希望在您的帖子中与您分享此www.example.com” 这是到目前为止我所做的事情：

---longueur de la chaîne de caractère depuis https
select LENGTH(substr(titre, INSTR(titre,'https:'))) from post_categorised_pages where id_post = '280853248721200_697941320345722';
---longueur de la chaîne de caractère depuis le blanc
select LENGTH(substr(titre, INSTR(titre,' ', 171))) from post_categorised_pages where id_post = '280853248721200_697941320345722';
--- différence pour obtenir la longueur de chaîne de caractères de l'url
select LENGTH(substr(titre, INSTR(titre,'https:'))) - LENGTH(substr(titre, INSTR(titre,' ', 171))) as longueur_url from post_categorised_pages where id_post = '280853248721200_697941320345722';
---url
select substr(titre, 171, 54)from post_categorised_pages where id_post = '280853248721200_697941320345722';

问题：如何在整个表“ post_categorised_page”中自动调整大小？当陈述要考虑www的https或http时，我可以介绍一下案例。我该怎么办？

非常感谢！！！

Answer 1

也许，您需要具有列名，而不是“ HTTP”，“ HTTPS”或“ WWW”字符串。在这种情况下，可能需要一个定义表来定义所有可能的来源。此表格有2列（ID和Source_name）。

然后，在您的post_categorised_pages表中，还插入消息的来源（ID值）。然后，进入查询，通过ID和（而不是

）与该定义表连接

select substr(titre, INSTR(titre,'https:'), (LENGTH(substr(titre, INSTR(titre,'https:'))) - LENGTH(substr(titre, INSTR(titre,' ', (INSTR(titre,'https:')))))))from post_categorised_pages where id_post = '280853248721200_697941320345722';

拥有

select substr(titre, INSTR(titre,"definition table".source_name), (LENGTH(substr(titre, INSTR(titre,"definition table".source_name))) - LENGTH(substr(titre, INSTR(titre,' ', (INSTR(titre,"definition table".source_name)))))))from post_categorised_pages where id_post = '280853248721200_697941320345722';

Answer 2

好的，这是我找到的解决方案（有一个错误，请参阅文章结尾）。我使用两个视图来最终提取我的字符串。通过连接请求创建第一个视图：

--- create intermediate table view with targeted pattern position
create or replace view Start_Position_Index as
with "post" as
(select id, text from "your_table" where id= 'xyz')
select id, instr(text,'#', 1, level) as position, text
from post
connect by level <= regexp_count(titre, '#');

然后

--- create working table view with full references and blank position for each pattern match and string_lenght for each one
create or replace view _#_index as
select id, position as hashtag_pos, INSTR(text,' ', position) as blank_position, INSTR(text,' ', position) - position as string_length, text
from Start_Position_Index;

最后，您将能够检索要在字符串中查找的主题标签（在这种情况下）。好吧，这样的错误： -如果您要查找的模式位于字符串的末尾，则它将检索空值，因为将没有空格（因为它位于字符串的末尾）。 -它没有很好的优化，因为这里我使用的是视图而不是表。我认为使用表格会更快。

但是我很确定有很多事情要做，以便优化此代码……有什么主意吗？面临的挑战是如何在不使用昂贵的regex且不使用pl / sql的情况下在字符串之间递归提取特定模式。你对那个怎么想的？

Answer 3

如何使用Oracle Full Text搜索？

这将为该列中的所有单词建立索引，并提供标签或网址，因为二者均写在一个单词中，且两者之间没有空格。

sql substr变量网址提取过程

3 个答案: