阅读gedcom文件和打印标签

时间:2016-05-24 16:40:11

标签: python

我编写了一个python程序来打印gedcom文件的每一行,其级别为no和tag(gedcom是一个基本上属于家谱的文件)。

gedcom的每一行都有如下结构

<level-number> <tag> <arguments>

现在我不希望打印所有标签,但只打印我在key_words列表中添加的特定标签,其余我想打印“无效标签”。现在的问题是,即使找到匹配的标签并打印,每次都会打印“无效标签”。基本上,如果每次都执行语句。

我该如何解决这个问题?我怎么能处理'INDI'一词,因为它没有打印

这是我的代码

   key_words = ['INDI','NAME','SEX','BIRT','DEAT','FAMC','FAMS','FAM','MARR','HUSB','WIFE','CHIL','DIV','DATE','HEAD','TRLR','NOTE']
   #opening file
   text_file = open('C:\Users\shree\Canopy\My-Family-18-May-2016-582.ged', 'r')

   print "Printing each line of gedcom file followed by level no and tag  line"

   for line in text_file:
       print "line is:-", line
       level_number = int(line[:1])
       print "Level number is",level_number   
       line = line.split()
       for word in key_words:
           if word in line:
              print "Tage is:-",word,"\n"
       else:
           print "invalid tag"

示例行

0 HEAD
1 SOUR Family Echo
2 WWW http://www.familyecho.com/
1 FILE My Family
1 DATE 18 MAY 2016
1 DEST ANSTFILE
1 GEDC
2 VERS 5.5.1
2 FORM LINEAGE-LINKED
1 SUBM @I1@
2 NAME Nico Rosberg
1 SUBN
1 CHAR UTF-8
0 @I1@ INDI
1 NAME Nico /Rosberg/
2 GIVN Nico
2 SURN Rosberg
2 _MARNM Rosberg
1 SEX M
1 BIRT
2 DATE 21 MAR 1989
1 FAMC @F1@
0 @I2@ INDI
1 NAME Tom /Rosberg/
2 GIVN Tom
2 SURN Rosberg
2 _MARNM Rosberg
1 SEX M
1 BIRT
2 DATE 15 MAR 1958
1 FAMS @F1@
1 FAMC @F2@
0 @I3@ INDI
1 NAME Laisly /Vettle/
2 GIVN Laisly
2 SURN Vettle
2 _MARNM Rosberg
1 SEX F
1 BIRT
2 DATE 15 SEP 1958
1 FAMS @F1@
1 FAMC @F3@

1 个答案:

答案 0 :(得分:0)

看起来你想要的是这个:

line_words = line.split()
# get the first element since that is the tag of line
line_tag = line_words[1].strip() 

# check if that is present in the keywords
if line_tag in key_words:
    print "Tag is:-",line_tag,"\n"
else:
    print "invalid tag"