我从wwwjdic示例中提取EDICT字典文件中的数据:
相同器官 [そうどうきかん] /(n) homologous organ/
相同染色体 [そうどうせんしょくたい] /(n) homologous chromosome/
相同組換え [そうどうくみかえ] /(n) homologous recombination/
相同的組み換え [そうどうてきくみかえ] /(n) homologous recombination/
相同的組換 [そうどうてきくみかえ] /(n) homologous recombination/
相同的組換え [そうどうてきくみかえ] /(n) homologous recombination/
相入れない [あいいれない] /(iK) (exp,adj-i) in conflict/incompatible/out of harmony/running counter/mutually exclusive/clashing with/
相年 [あいどし] /(n,adj-no) the same age/
相伴 [しょうばん] /(n,vs) partaking/participating/taking part in/sharing (something with someone)/
相伴う [あいともなう] /(v5u) to accompany/
相判 [あいはん] /(n,vs) (1) official seal/verification seal/affixing a seal to an official document/(2) making a joint signature or seal/
相判 [あいばん] /(n) (1) medium-sized paper (approx. 15x21 cm, used for notebooks)/(2) medium-sized photo print (approx. 10x13 cm)/
相判 [あいばん] /(n,vs) (1) official
这些行指定每个条目的词性,即名词为/(n)
,形容词为/(adj)
。我有兴趣获取在此数组中标记为词性的所有条目:
["n", "n-adv", "n-pref", "n-suf", "n-t", "num", "pn", "adj-no", "adj-f", "adv-n", "vs"]
我正试图分割这样的行
file = File.open("EDICT.txt")
file.each_line do |line|
if line[#Regex]
.
.
我正在使用正则表达式,但我得到的最远是
/\/[(](n|n-adv|n-pref|n-suf|n-t|num|pn|adj-no|adj-f|adv-n|vs|n,vs)[)]/
这不健全。此外,有时会有这样的标签:
/(adj-no,n-adv,n-t)
与正则表达式不匹配。同时它不应与这些术语相匹配:
["adj-i", "adj-na", "adj-pn", "adj-t", "adj", "adv", "adv-to", "aux", "aux-v", "aux-adj", "conj",
"ctr", "exp", "int", "iv", "pref", "prt", "suf", "v1", "v2a-s", "v4h", "v4r", "v5", "v5argu",
"v5b", "v5g", "v5k", "v5k-s", "v5m", "v5n", "v5r", "v5r-i", "v5s", "v5t", "v5u", "v5u-s", "v5uru",
"v5z", "vz", "vi", "vk", "vn", "vs-c", "vs-i", "vs-s", "vt"]
查看该行是否包含所需的/()
代码的更好,更强大的方法是什么?
答案 0 :(得分:-1)
class String
Nouns = %w[n n-adv n-pref n-suf n-t num pn adj-no adj-f adv-n vs]
def noun_entry?; self[%r{/\(([^)]+)\)}, 1].split(/,\s*/).&(Nouns).any? end
end
"相同器官 [そうどうきかん] /(n) homologous organ/".noun_entry?
# => true
"相判 [あいばん] /(n,vs) (1) official".noun_entry?
# => true
"ある単語 [あるたんご] /(adj-no,n-adv,n-t) .../".noun_entry?
# => true
"別の単語 [べつのたんご] /(ctr,exp,int) .../".noun_entry?
# => false
[^)]
不是)
。[^)]+
是一个非空序列,不包括)
。([^)]+)
捕获了这样的序列。%r{/\(([^)]+)\)}
是一个正则表达式,其序列由/(
和)
包围。[regex, 1]
取出第一次匹配,即匹配[^)]+
。split(/,\s*/)
将该序列用逗号(可选地后跟白色字符)分隔成数组。&(Nouns)
将该数组与数组Nouns
的交集。any?
看到交叉路口是否有任何东西。