来自多行的聚合文本字段(javascript / python)

时间:2013-06-03 10:42:53

标签: javascript python text lines

首先,我想让你知道我在编码方面相对较新,而且我对Python和Javascript只有肤浅的知识。

我有这个巨大的txt,其中包含数据结构如下的团队名称和名称:

Name1, Surname1  Team1
                  Team2
                  Team3
Name2, Surname2  Team2
                  Team4
Name3, Surname3  Team1
                  Team5

理想情况下,我想通过Team#提取我的数据搜索并返回属于它的人的名字。

EG。我需要team1和team2的组件。我的新txt输出应如下所示:

Team1, Name1, Surname1, Name3, Surname3
Team2, Name1, Surname1, Name2, Surname2

非常感谢您的帮助

1 个答案:

答案 0 :(得分:0)

Python版本可能看起来像这样:

fobj_in = io.StringIO("""Name1, Surname1  Team1
                  Team2
                  Team3
Name2, Surname2  Team2
                  Team4
Name3, Surname3  Team1
                  Team5""")

fobj_out = io.StringIO()

from collections import defaultdict

teams = defaultdict(list)

for line in fobj_in:
    items = line.split()
    if len(items) == 3:
        name = items[:2]
        team = items[2]
    else:
        team = items[0]
    teams[team].append(name)

for team_name in sorted(teams.keys()):
    fobj_out.write(team_name + ', ')
    for name in teams[team_name][:-1]:
        fobj_out.write('{} {}, '.format(name[0], name[1]))
    name = teams[team_name][-1]
    fobj_out.write('{} {}\n'.format(name[0], name[1]))


fobj_out.seek(0)
print(fobj_out.read())

输出:

Team1, Name1, Surname1, Name3, Surname3
Team2, Name1, Surname1, Name2, Surname2
Team3, Name1, Surname1
Team4, Name2, Surname2
Team5, Name3, Surname3

只需执行此操作即可读取和写入实际文件:

fobj_in = open('in_file.txt')
fobj_out = open('out_file.txt', 'w')

修改

注意:示例数据似乎不包含在输出中的一行上产生多个名称的情况。

使用this input data,我们需要更改代码:

from collections import defaultdict
teams = defaultdict(list)
for line in fobj_in:
    if not line.strip():
        continue
    items = [entry.strip() for entry in line.split('\t') if entry]
    if len(items) == 2:
        name = items[0]
        team = items[1]
    else:
        team = items[0]
    teams[team].append(name)
for team_name in sorted(teams.keys()):
    fobj_out.write(team_name + ', ')
    for name in teams[team_name][:-1]:
        fobj_out.write('{}, '.format(name))
    name = teams[team_name][-1]
    fobj_out.write('{}\n'.format(name))

生成的文件内容如下所示:

"Décore ta vie" (2003), Boilard, Naggy
"Mouki" (2010), Boileau, Sonia
A chacun sa place (2011), Boinem, Victor Emmanuel
Absence (2009) (V), Boillat, Patricia
C.A.L.L.E. (2005), Boillat, Patricia
Comment devenir un trou de cul et enfin plaire aux femmes (2004), Boire, Roger
Couleur de peau: Miel (2012), Boileau, Laurent
Hergé:Les aventures de Tintin (2004), Boillot, Olivier
Isola, là dove si parla la lingua di Bacco (2011)  (co-director), Boillat, Patricia
L'île (2011), Boillot, Olivier
La beauté fatale et féroce... (1996), Boire, Roger
Last Call Indian (2010), Boileau, Sonia
Le Temple Oublié (2005), Boillot, Olivier
Le pied tendre (1988), Boire, Roger
Legit (2006), Boinski, James W.
Nubes (2010), Boira, Francisco
Questions nationales (2009), Boire, Roger
Reconciling Rwanda (2007), Boiko, Patricia
Soviet Gymnasts (1955), Boikov, Vladimir
The Corporal's Diary (2008) (V)  (head director), Boiko, Patricia
Un gars ben chanceux (1977), Boire, Roger