为python中的每一列设置唯一的缩写

时间:2015-12-23 16:16:12

标签: python csv unique abbreviation

我在csv文件中有这样的数据

private final static String GIFTS_CSV = "gifts.csv";
public final static String PATH = "src/main/resources/static/";

public static Map<Integer, Gift> getGifts() throws IOException {
    String line;
    HashMap<Integer, Gift> gifts = new HashMap<>();

    BufferedReader br = new BufferedReader(new FileReader(PATH + GIFTS_CSV));
    br.readLine();
    while ((line = br.readLine()) != null) {
        String[] giftStr = line.split(CVS_SPLIT_BY);
        Gift gift = new Gift(Integer.parseInt(giftStr[0]), 
                         new Point(Double.parseDouble(giftStr[1]),
                Double.parseDouble(giftStr[2])), Double.parseDouble(giftStr[3]));
        gifts.put(gift.getId(), gift);
    }
    return gifts;
}

我想为每列设置一个唯一的缩写。例如:

  • 年金计算器= annca
  • 年金计算器= annsca

请你帮我弄清楚什么是python中最好的方法。

由于

1 个答案:

答案 0 :(得分:3)

你的问题没有完全明确,但似乎很有趣。我捅了一下。我写了一个函数,它接受一个短语列表并返回一个字典,缩写作为键。它首先取每个单词的前两个字母并将它们连接起来作为候选缩写。如果使用了这个缩写之前它逐渐从每个单词的开头开始播放越来越多的字母,直到你得到一个唯一的缩写。然后我在你的样本数据上测试它。你几乎肯定想修改它,但它应该给你一些想法:

def makeAbbreviations(headers):
    abbreviations = {}
    for header in headers:
        header = header.lower()
        words = header.split()
        n = max(len(w) for w in words)
        i = 2
        starts = [w[:i] for w in words]
        abbrev = ''.join(starts)

        while abbrev in abbreviations and i <= n:
            i += 1
            for j,w in enumerate(words):
                starts[j] = w[:i]
                abbrev = ''.join(starts)
                if not abbrev in abbreviations: break
        abbreviations[abbrev] = header
    return abbreviations

myHeaders = ['Ad Group', 'Annuity Calculator', 'Tax Deferred Annuity',
             'Annuity Tables', 'annuities calculator', 'annuity formula',
             'Annuities Explained', 'Deferred Annuies Calculator',
             'Current Annuity Rates', 'Forbes.com', 'Annuity Definition',
             'fixed income', 'Immediate fixed Annuities',
             'Deferred Variable Annuities', '401k Rollover',
             'Deferred Annuity Rates', 'Deferred Annuities',
             'Immediate Annuities Definition', 'Immediate Variable Annuities',
             'Variable Annuity', 'Aig Annuities', 'Retirement Income', 'retirment system',
             'Online Financial Planner', 'Certified Financial Planner']

d = makeAbbreviations(myHeaders)
for (k,v) in d.items(): print(k,v,sep = " = ")

输出:

imande = immediate annuities definition
adgr = ad group
fiin = fixed income
40ro = 401k rollover
resy = retirment system
vaan = variable annuity
devaan = deferred variable annuities
rein = retirement income
imvaan = immediate variable annuities
fo = forbes.com
imfian = immediate fixed annuities
dean = deferred annuities
anca = annuity calculator
cuanra = current annuity rates
annca = annuities calculator
onfipl = online financial planner
aian = aig annuities
ande = annuity definition
anfo = annuity formula
cefipl = certified financial planner
tadean = tax deferred annuity
deanca = deferred annuies calculator
anex = annuities explained
anta = annuity tables
deanra = deferred annuity rates