家庭邮件合并(代码高尔夫)

时间:2010-11-07 04:33:43

标签: code-golf

我前几天写了一些邮件合并代码,虽然它有效但我被代码关闭了。我想看看其他语言会是什么样子。

因此,对于输入,例程采用联系人列表

Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Erica,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Marge,Simpson,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
Ted,Simpson,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Raoul,Simpson,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6

然后它将具有相同地址和姓氏的行合并到一个记录中。假设行未排序)。代码还应该足够灵活,以便可以按任何顺序提供字段(因此需要将字段索引作为参数)。对于一个两口之家,它连接两个名字字段。对于三个或更多的家庭,名字设置为“the”,姓氏设置为“surname family”。

Erica and Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
The,Simpson Family,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004

我的C#实现是:

var source = File.ReadAllLines(@"sample.csv").Select(l => l.Split(','));
var merged = HouseholdMerge(source, 0, 1, new[] {1, 2, 3, 4, 5});

public static IEnumerable<string[]> HouseholdMerge(IEnumerable<string[]> data, int fnIndex, int lnIndex, int[] groupIndexes)
{            
    Func<string[], string> groupby = fields => String.Join("", fields.Where((f, i) => groupIndexes.Contains(i)));

    var groups = data.OrderBy(groupby).GroupBy(groupby);

    foreach (var group in groups)
    {
        string[] result = group.First().ToArray();

        if (group.Count() == 2)
        {
            result[fnIndex] += " and " + group.ElementAt(1)[fnIndex];
        }
        else if (group.Count() > 2)
        {
            result[fnIndex] = "The";
            result[lnIndex] += " Family";
        }

        yield return result;
    }            
}

我不喜欢我如何做groupby委托。我想如果C#有一些方法将字符串表达式转换为委托。例如Func groupby = f =&gt; “f [2] + f [3] + f [4] + f [5] + f [1];”我觉得这样的事情可能在Lisp或Python中完成。我期待在其他语言中看到更好的实现。

编辑:社区维基复选框在哪里?有些mod请修复。

6 个答案:

答案 0 :(得分:3)

Ruby - 181 155

名称/姓氏索引包含在代码中:ab。输入数据来自ARGF。

a,b=0,1
[*$<].map{|i|i.strip.split ?,}.group_by{|i|i.rotate(a).drop 1}.map{|i,j|k,l,m=j
k[a]+=' and '+l[a]if l
(k[a]='The';k[b]+=' Family')if m
puts k*','}

答案 1 :(得分:1)

Python 2.6.6 - 287个字符

这假设您可以对文件名进行硬编码(名为i)。如果你想从命令行获取输入,这会增加~16个字符。

from itertools import*
for z,g in groupby(sorted([l.split(',')for l in open('i').readlines()],key=lambda x:x[1:]), lambda x:x[2:]):
 l=list(g);r=len(l);k=','.join(z);o=l[0]
 if r>2:print'The,'+o[1],"Family,"+k,
 elif r>1:print o[0],"and",l[1][0]+","+o[1]+","+k,
 else:print','.join(o),

输出

Erica and Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
The,Simpson Family,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004

我确信这可以改进,但现在已经很晚了。

答案 2 :(得分:1)

Python - 178个字符

import sys
d={}
for x in sys.stdin:F,c,A=x.partition(',');d[A]=d.get(A,[])+[F]
print"".join([" and ".join(v)+c+A,"The"+c+A.replace(c,' Family,',1)][2<len(v)]for A,v in d.items())

输出

Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004
The,Simpson Family,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
Erica and Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004

答案 3 :(得分:1)

Python - 没有打高尔夫球

如果输入文件的索引不是0和1,我不确定行的顺序是什么

import csv
from collections import defaultdict

class HouseHold(list):
    def __init__(self, fn_idx, ln_idx):
        self.fn_idx = fn_idx
        self.ln_idx = ln_idx

    def append(self, item):
        self.item = item
        list.append(self, item[self.fn_idx])

    def get_value(self):
        fn_idx = self.fn_idx
        ln_idx = self.ln_idx
        item = self.item
        addr = [j for i,j in enumerate(item) if i not in (fn_idx, ln_idx)]
        if len(self) < 3:
            fn, ln = " and ".join(self), item[ln_idx]
        else:
            fn, ln = "The", item[ln_idx]+" Family"
        return [fn, ln] + addr

def source(fname):
    with open(fname) as in_file:
        for item in csv.reader(in_file):
            yield item

def household_merge(src, fn_idx, ln_idx, groupby):
    res = defaultdict(lambda:HouseHold(fn_idx, ln_idx))
    for item in src:
        key = tuple(item[x] for x in groupby)
        res[key].append(item)
    return res.values()

data =  household_merge(source("sample.csv"), 0, 1, [1,2,3,4,5,6,7])
with open("result.csv", "w") as out_file:
    csv.writer(out_file).writerows(item.get_value() for item in data)

答案 4 :(得分:1)

Haskell - 341 321

(根据评论进行更改)。

不幸的是,Haskell没有标准的分割功能,这使得它相当长。

输入stdin,输出stdout。

import List
import Data.Ord
main=interact$unlines.e.lines
s[]=[]
s(',':x)=s x
s l@(x:y)=let(h,i)=break(==k)l in h:(s i)
t[]=[]
t x=tail x
h=head
m=map
k=','
e l=m(t.(>>=(k:)))$(m c$groupBy g$sortBy(comparing t)$m s l)
c(x:[])=x
c(x:y:[])=(h x++" and "++h y):t x
c x="The":((h$t$h x)++" Family"):(t$t$h x)
g a b=t a==t b

答案 5 :(得分:0)

Lua,434字节

x,y=1,2 s,p,r,a=string.gsub,pairs,io.read,{}for j,b,c,d,e,f,g,h,i in r('*a'):gmatch('('..('([^,]*),'):rep(7)..'([^,]*))\n')
do k=s(s(s(j,b,''),c,''),'[,%s]','')for l,m in p(a)do if not m.f and (m[y]:match(c) and m[9]==k) then z=1
if m.d then m[x]="The"m[y]=m[y]..' family'm.f=1 else m[x]=m[x].." and "..b m.d=1 end end end if not z then
a[#a+1]={b,c,d,e,f,g,h,i,k} end z=nil end for k,v in p(a)do v[9]=nil print(table.concat(v,','))end