进一步修改原始问题 问题源于期望正则表达式可以完全相同或接近“grep”或某些编程语言。以下是我的预期,并且它没有发生的事实产生了我的问题(使用cygwin):
echo "regex unusual operation will deport into a different" > out.txt
grep "will * dep" out.txt
"regex unusual operation will deport into a different"
<小时/> 原始问题
kwic(immigCorpus, "deport", window = 3)
其输出为 -
[BNP, 157] The BNP will | deport | all foreigners convicted
[BNP, 1946] . 2. | Deport | all illegal immigrants
[BNP, 1952] immigrants We shall | deport | all illegal immigrants
[BNP, 2585] Criminals We shall | deport | all criminal entrants
尝试/学习我执行的基础知识
kwic(immigCorpus, "will *depo", window = 3, valuetype = "regex")
期待得到
[BNP, 157] The BNP will | deport | all foreigners convicted
但我明白了:
kwic object with 0 rows
类似的尝试,如
kwic(immigCorpus, ".*will *depo.*", window = 3, valuetype = "regex")
获得相同的结果:
kwic object with 0 rows
为什么?符号化?如果是这样我应该怎么写正则表达式?
PS感谢这个精彩的套餐
答案 0 :(得分:0)
ITAUR存储库中的示例基于较旧的语法。您需要的是phrase()
包装器 - 请参阅?phrase
。您还应该使用*
来尝试使用正则表达式语法,因为它可能不是您想要的,并且因为正则表达式不能以“*”开头。 (This可能会有所帮助。)默认的“glob”值类型可能会达到你想要的效果。
library("quanteda")
## Package version: 1.1.4
## Parallel computing: 2 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
kwic(data_char_ukimmig2010, phrase("will deport"))
## [BNP, 156:157] nation.- The BNP | will deport | all foreigners convicted of crimes
kwic(data_char_ukimmig2010, phrase("will .*deport.*"), valuetype = "regex")
## [BNP, 156:157] nation.- The BNP | will deport | all foreigners convicted of crimes
答案 1 :(得分:0)
You are trying to match a phrase with your pattern. By default, the pattern
argument is treated as a space separated list of keywords, and the search is performed against this list. So, you may get your expected result using
> kwic(immigCorpus, phrase("will deport"), window = 3)
[BNP, 156:157] - The BNP | will deport | all foreigners convicted
A valuetype = "regex"
makes sense if you are using a regex. E.g. to get both shall
and will deport
use
> kwic(immigCorpus, phrase("(will|shall) deport"), window = 3, valuetype = "regex")
[BNP, 156:157] - The BNP | will deport | all foreigners convicted
[BNP, 1951:1952] illegal immigrants We | shall deport | all illegal immigrants
[BNP, 2584:2585] Foreign Criminals We | shall deport | all criminal entrants