Using Python 3, I have a list of around 14,500 unknown words and want to group them based on their features. I'm using re.compile
, trying to get 5 dictionaries with the words that match each criteria, and a final list of words that don't match any criteria but some of the words that should've been grouped already are slipping through. Let me give an example:
Here are the re.compile
statements I'm using:
import re
wordscaps = re.compile("^([A-Z]*)$")
lettersnumbers = re.compile("^([a-zA-Z][1-9])")
numbersonly=re.compile("^([^a-zA-Z][1-9]+)$")
titlecase = re.compile("^([A-Z][a-z]{1,})$")
longwords=re.compile("^([a-z]{15,})$")
The way I'm doing it is:
for line in testfile:
if not line.strip():
continue
part=line.strip().split("\t")
if part[1] in UNK_words:
#print(part[1])
unk_word_tags[part[1]]={part[2]:1}
if wordscaps.match(part[1]):
unk2dict[part[1]]=part[2]
elif lettersnumbers.match(part[1]):
unk3dict[part[1]]=part[2]
elif numbersonly.match(part[1]):
unk4dict[part[1]]=part[2]
elif titlecase.match(part[1]):
unk5dict[part[1]]=part[2]
elif longwords.match(part[1]):
unk6dict[part[1]]=part[2]
else:
unkdict[part[1]]=part[2]
but in my final unkdict
I'm getting words like:
'23390','4400','HS2NF5','IS1112C','vA33delta','Cbf5p','Grin2c'
I'm just wondering if there's something wrong with how my re.compile
statement is put.
答案 0 :(得分:0)
您使用以下逻辑定义了正则表达式:what_I_dont_want,what_i_want。 这是行不通的,因为它首先期望一个字符与您不需要的字符匹配,然后再与您所需的字符匹配。您只需要定义自己想要的内容:(我想数字加0)
Future<List<String>> getFieldDataAsString(String animal, String fieldName) async {
var dbClient = await db;
var results = await dbClient.rawQuery('SELECT lbOption FROM jkAssessData Where lbAnimal = \'$animal\' AND lbField = \'$fieldName\'');
return results.map((Map<String, dynamic> row) {
return row["lbOption"] as String;
}).toList();
}