Using re.compile to classify unknown words in a text file based on their features

时间:2018-12-19 11:40:46

标签: python regex python-3.x

Using Python 3, I have a list of around 14,500 unknown words and want to group them based on their features. I'm using re.compile, trying to get 5 dictionaries with the words that match each criteria, and a final list of words that don't match any criteria but some of the words that should've been grouped already are slipping through. Let me give an example:

Here are the re.compile statements I'm using:

import re

wordscaps = re.compile("^([A-Z]*)$")

lettersnumbers = re.compile("^([a-zA-Z][1-9])")

numbersonly=re.compile("^([^a-zA-Z][1-9]+)$")

titlecase = re.compile("^([A-Z][a-z]{1,})$")

longwords=re.compile("^([a-z]{15,})$")

The way I'm doing it is:

for line in testfile:
    if not line.strip():
        continue
    part=line.strip().split("\t")
    if part[1] in UNK_words:
        #print(part[1])
        unk_word_tags[part[1]]={part[2]:1}
        if wordscaps.match(part[1]):
            unk2dict[part[1]]=part[2]

        elif lettersnumbers.match(part[1]):
            unk3dict[part[1]]=part[2]  

        elif numbersonly.match(part[1]):
            unk4dict[part[1]]=part[2]

        elif titlecase.match(part[1]):
            unk5dict[part[1]]=part[2]

        elif longwords.match(part[1]):
            unk6dict[part[1]]=part[2]

        else:
            unkdict[part[1]]=part[2]

but in my final unkdict I'm getting words like:

'23390','4400','HS2NF5','IS1112C','vA33delta','Cbf5p','Grin2c'

I'm just wondering if there's something wrong with how my re.compile statement is put.

1 个答案:

答案 0 :(得分:0)

您使用以下逻辑定义了正则表达式:what_I_dont_want,what_i_want。 这是行不通的,因为它首先期望一个字符与您不需要的字符匹配,然后再与您所需的字符匹配。您只需要定义自己想要的内容:(我想数字加0)

  Future<List<String>> getFieldDataAsString(String animal, String fieldName) async {
    var dbClient = await db;

    var results = await dbClient.rawQuery('SELECT lbOption FROM jkAssessData Where lbAnimal = \'$animal\' AND lbField = \'$fieldName\'');

    return results.map((Map<String, dynamic> row) {
      return row["lbOption"] as String;
    }).toList();
  }