我正在运行Apache pig脚本,我希望能够在字段中查找完整的单词串。例如,我想查找“以下更新已下载并准备安装”
我遇到的问题是matches
与RegEx似乎只允许我输入单个词来搜索。所以我最终寻找“跟随”& “更新”& “已下载”& “准备好了”& “装置”,它们相隔多远并不重要。我只是试图包含更多的单词来试图锁定我正在寻找的东西,但我想看看是否有可能找到整串连续的单词。
以下是我当前过滤器的示例。
downloadFilter = FILTER windowsLog BY ($16 matches '^(?=.*?(following))(?=.*?(updates))(?=.*?(downloaded))(?=.*?(ready))(?=.*?(installation)).*$');
示例记录我正试图点击。
3/7/2014 19:15:54:141 972 13c0 Report REPORT EVENT: {EF338545-61FB-434A-ACB6-F9D17A986677} 2014-03-07 19:15:49:141-0600 1 188 102 {00000000-0000-0000-0000-000000000000} 0 0 AutomaticUpdates Success Content Install Installation Ready: The following updates are downloaded and ready for installation. This computer is currently scheduled to install these updates on Saturday, March 08, 2014 at 3:00 AM: - Update for Microsoft .NET Framework 3.5.1 on Windows 7 and Windows Server 2008 R2 SP1 for x64-based Systems (KB2836943) - Security Update for Microsoft .NET Framework 3.5.1 on Windows 7 and Windows Server 2008 R2 SP1 for x64-based Systems (KB2863240) - Update for User-Mode Driver Framework version 1.11 for Windows 7 for x64-based Systems (KB2685813) - Update for Windows 7 for x64-based Systems (KB2791765)