告诉我SQL Server全文搜索器很疯狂,不是我

时间:2010-06-03 13:29:30

标签: sql-server sql-server-2000 full-text-search full-text-indexing

我有一些客户拥有用户正在搜索的特定地址:

  

123通用方式

数据库中有5行匹配:

ResidentialAddress1
=============================
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY

我运行FT查询以查找这些行。我将向您展示每个步骤,因为我在搜索中添加了更多条件:

SELECT ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.ResidentialAddress1, '"123*"')

ResidentialAddress1
=========================
123 MAPLE STREET
12345 TEST
123 MINE STREET
123 GENERIC WAY
123 FAKE STREET
...

(30 row(s) affected)

好的,到目前为止一直很好,现在添加“ generic ”这个词:

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')

ResidentialAddress1
=============================
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY

(5 row(s) affected)

优异。现在我将添加用户想要确保存在的最终关键字:

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"way*"')


ResidentialAddress1            
------------------------------ 

(0 row(s) affected)

咦?没有行?如果我只查询“方式*”怎么办:

SELECT ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.ResidentialAddress1, '"way*"')

ResidentialAddress1            
------------------------------ 

(0 row(s) affected)

起初我认为这可能是因为*,并且要求根way后面有更多字符。但事实并非如此:

  • 搜索“123 *”匹配“123”
  • 搜索“generic *”匹配“generic”
  • 网上书籍说,星号匹配零个,一个或多个字符

如果我为s& g删除*该怎么办?

SELECT ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.ResidentialAddress1, '"way"')

Server: Msg 7619, Level 16, State 1, Line 1
A clause of the query contained only ignored words. 

所以有人可能会认为你只是单独或作为root用户,甚至不允许搜索 way。但这也不是真的:

SELECT * FROM Patrons
WHERE CONTAINS(Patrons.*, '"way*"')

AccountNumber FirstName Lastname
------------- --------- --------
33589         JOHN      WAYNE                    

总而言之,用户正在搜索包含所有单词的行:

  

123通用方式

我正确地将其转换为WHERE条款:

SELECT * FROM Patrons
WHERE CONTAINS(Patrons.*, '"123*"')
AND CONTAINS(Patrons.*, '"generic*"')
AND CONTAINS(Patrons.*, '"way*"')

不返回任何行。告诉我这不会起作用,这不是我的错,SQL Server很疯狂。

注意:我已经清空了FT索引并重建了它。

更新一个

SELECT Lastname, ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.*, '"gen*"')

Lastname                  ResidentialAddress1            
------------------------- ------------------------------ 
SAVE                      123 GENERIC WAY
Genders                   
SAVE                      123 GENERIC WAY
Patron                    123 GENERIC WAY
SAVE                      123 GENERIC WAY
SAVE                      234 GENERIC WAY
SAVE                      123 GENERIC WAY

(7 row(s) affected)

更新两个

假装用户输入:

  

123 generic wa

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"wa*"')

ResidentialAddress1            
------------------------------ 

(0 row(s) affected)

真正的问题是,用户输入的内容非常有效,他们希望看到任何人都期望看到的内容。


更新三

有人要求这一切,这不是我的错!:

CREATE TABLE [dbo].[Patrons] (
    [PatronGUID]  uniqueidentifier ROWGUIDCOL  NOT NULL ,
    [AccountNumber] [bigint] NULL ,
    [FirstName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [MiddleInitial] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Lastname] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [EyeColor] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [HairColor] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Gender] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Birthday] [datetime] NULL ,
    [Height] [int] NULL ,
    [Weight] [int] NULL ,
    [FacialHair] [tinyint] NULL ,
    [Nationality] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [IdentifyingMarks] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseExpires] [datetime] NULL ,
    [DriversLicenseDateVerified] [datetime] NULL ,
    [PassportNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PassportRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PassportCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PassportExpires] [datetime] NULL ,
    [PassportDateVerified] [datetime] NULL ,
    [OtherIdentificationNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationExpires] [datetime] NULL ,
    [OtherIdentificationDateVerified] [datetime] NULL ,
    [ResidentialAddress1] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialAddress2] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialAddress3] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialCity] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialZipCode] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialCountry] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialPhoneNumber] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [CountryOfResidence] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessAddress1] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessAddress2] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessAddress3] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessCity] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessZipCode] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessCountry] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessPhone] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PositionWithFirm] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [EmployerTelephone] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [MemberCardType] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PlayerStatusCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [AccountType] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [AccountStatus1] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [AccountStatus2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [IsVIPExchangeRate] [tinyint] NULL ,
    [ChangedUserGUID_Depricated] [uniqueidentifier] NULL ,
    [ChangedUser] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ChangedDate] [datetime] NULL ,
    [ChangedWorkstation] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PendingUpdates_Depricated] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL 
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Patrons] ADD 
    CONSTRAINT [DF_Patrons_PatronGUID] DEFAULT (newid()) FOR [PatronGUID],
    CONSTRAINT [PK_Patrons] PRIMARY KEY  NONCLUSTERED 
    (
        [PatronGUID]
    ) WITH  FILLFACTOR = 90  ON [PRIMARY] 
GO

if (select DATABASEPROPERTY(DB_NAME(), N'IsFullTextEnabled')) <> 1 
exec sp_fulltext_database N'enable' 

GO

if not exists (select * from dbo.sysfulltextcatalogs where name = N'TheFullTextCatalog')
exec sp_fulltext_catalog N'TheFullTextCatalog', N'create' 

GO

exec sp_fulltext_table N'[dbo].[Patrons]', N'create', N'TheFullTextCatalog', N'PK_Patrons'
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'FirstName', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'MiddleInitial', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'Lastname', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'EyeColor', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'IdentifyingMarks', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress1', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress2', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress3', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialCity', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialZipCode', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialRegion', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialCountry', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialPhoneNumber', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'CountryOfResidence', N'add', 1033  
GO

exec sp_fulltext_table N'[dbo].[Patrons]', N'activate'  
GO

以下是那个不相信我的人的截图:

该查询应该有效但不能: alt text http://i49.tinypic.com/dbo8w9.png

有效的查询,但无效: alt text http://i49.tinypic.com/30mptm9.png

对证明内容有效但无用的查询: alt text http://i49.tinypic.com/2q04nmc.png


更新四

查询无法写为

CONTAINS(Patrons.*, 'words...')

由于FT指数中没有逻辑或物理覆盖的项目。例如用户查询 为:

  

6/4/2010 ian boyd 619

提供四个关键字:

  • 2010年6月4日
  • 伊恩
  • 博伊德
  • 619

这意味着他们希望所有条件都成立,伪代码为:

WHERE 6/4/2010 is in the row
AND ian is in the row
AND boyd is in the row
AND 619 is in the row

将其转换为部分查询:

WHERE --Keyword 1: 6/4/2010
(
   ((ChangedDate >= '20100604') AND (ChangedDate < '20100605'))
   OR 
   ((LastTransactionDate >= '20100604') AND (LastTransactionDate < '20100605'))
   OR 
   (CONTAINS(Patrons.*, '"6/4/2010*"')
)
AND --Keyword 2: ian
(
    CONTAINS(Patrons.*, '"ian*"')
)
AND --Keyword 3: boyd
(
    CONTAINS(Patrons.*, '"boyd*"')
)
AND --Keyword 4: 619
(
    (AccountNumber IN (SELECT CAST(619 AS bigint)))
    OR
    (CONTAINS(Patrons.*, '"619*"'))
)

其中一位回答者正在查看原始问题中提出的简化示例;不是现实世界。要说有多个AND条款是不正确,那就是nieve。

5 个答案:

答案 0 :(得分:6)

该消息告诉您“way”是一个停用词,这意味着它被忽略而没有编入索引。这就是为什么你可以找到“wayne”而不是“way”的原因。

所以,不,这不是疯了,你也不是。只是一个简单的误解。

答案 1 :(得分:4)

您可能在创建FT索引时使用了系统停止列表。 way这个词碰巧在那里。您可以使用此查询查看它:

SELECT *
FROM sys.fulltext_system_stopwords
WHERE stopword = 'way'
AND language_id = 1033

您可以关闭停止列表或创建自定义列表,但更好的解决方案是正确编写查询;不要使用多个WHERE CONTAINS子句,将它们合并为一个。否则,SQL Server可能无法有效地使用FT索引。

您的查询应该是这样的:

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*" AND "generic*" AND "way*"')

如果你这样做,停止词就会被忽略;如果你没有包含术语way*,它仍然会返回所有相同的结果。


修改:刚刚注意到您标记了此sql-server-2000,因此第一个查询可能无效。在SQL 2000中,它们是“干扰词”,我相信配置是全局的,您没有单独的停止列表。不过,如果你写一个WHERE CONTAINS子句而不是几个子句,你仍会得到结果。

要编辑SQL Server 2000中的干扰词,您必须在SQL Server FTDATA配置文件夹中编辑特定于语言的文件。更多详情请点击此处:SQL Server Full Text Search Noise Words and Thesaurus Configurations

答案 2 :(得分:3)

解决方案1:

您想尝试Transform Noise Word选项(SQL 2008)

关闭此功能,应停止删除字词。

例如:

sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'transform noise words', 1
RECONFIGURE
GO

修改1:

希望旧版本的MS SQL可能有类似内容吗?

答案 3 :(得分:2)

答案 4 :(得分:0)

也许它需要三个以上的字母字符。尝试另外三个字母的单词,例如gen*