将文本文件分为两个不同的部分

时间:2016-06-02 04:09:39

标签: python json python-2.7

我编写了一个简单的脚本来收集JSON文件中的标题列表,并生成一个包含该列表的文本文件。

结果如下:

Animal geography
Autobiogeography
Chorography
Economic geography
Footloose industry
Geomorphometry
Health geography
Human geography
Military geography
Philosophy of geography
Physical geography
Political geography
Regional geography
Satirical cartography
Settlement geography
Transport geography
Vernacular geography
Visual geography
Category:Cartography
Category:Economic geography
Category:Geodemography
Category:Human geography
Category:Military geography
Category:Physical geography
Category:Political geography
Category:Regional geography
Category:Settlement geography
Category:Topography
Category:Toponymy
Category:Transportation geography
Category:Vernacular geography
Category:Geography by place  

问题:

我现在面临的问题是如何将文本文件分成两部分:

第一部分是包含以下内容的文本文件:

Animal geography
Autobiogeography
Chorography
Economic geography
Footloose industry
Geomorphometry
Health geography
Human geography
Military geography
Philosophy of geography
Physical geography
Political geography
Regional geography
Satirical cartography
Settlement geography
Transport geography
Vernacular geography
Visual geography

第二个文本文件,其中包含以Category:

开头的文本
Category:Cartography
Category:Economic geography
Category:Geodemography
Category:Human geography
Category:Military geography
Category:Physical geography
Category:Political geography
Category:Regional geography
Category:Settlement geography
Category:Topography
Category:Toponymy
Category:Transportation geography
Category:Vernacular geography
Category:Geography by place  

我完全不知道如何去做。请指教。

很抱歉这个令人困惑的标题。我不知道如何解释我的问题。

谢谢。

修改

例如,我已从此API(https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3ABranches%20of%20geography&cmlimit=100)中提取了所有标题:

{  
   "batchcomplete":"",
   "query":{  
      "categorymembers":[  
         {  
            "pageid":5259784,
            "ns":0,
            "title":"Animal geography"
         },
         {  
            "pageid":8670379,
            "ns":0,
            "title":"Autobiogeography"
         },
         {  
            "pageid":4254743,
            "ns":0,
            "title":"Chorography"
         },
         {  
            "pageid":177512,
            "ns":0,
            "title":"Economic geography"
         },
         {  
            "pageid":7907104,
            "ns":0,
            "title":"Footloose industry"
         },
         {  
            "pageid":5155886,
            "ns":0,
            "title":"Geomorphometry"
         },
         {  
            "pageid":2596739,
            "ns":0,
            "title":"Health geography"
         },
         {  
            "pageid":13372,
            "ns":0,
            "title":"Human geography"
         },
         {  
            "pageid":1794929,
            "ns":0,
            "title":"Military geography"
         },
         {  
            "pageid":5886597,
            "ns":0,
            "title":"Philosophy of geography"
         },
         {  
            "pageid":23263,
            "ns":0,
            "title":"Physical geography"
         },
         {  
            "pageid":1845092,
            "ns":0,
            "title":"Political geography"
         },
         {  
            "pageid":711230,
            "ns":0,
            "title":"Regional geography"
         },
         {  
            "pageid":42099944,
            "ns":0,
            "title":"Satirical cartography"
         },
         {  
            "pageid":33566568,
            "ns":0,
            "title":"Settlement geography"
         },
         {  
            "pageid":9710174,
            "ns":0,
            "title":"Transport geography"
         },
         {  
            "pageid":24644075,
            "ns":0,
            "title":"Vernacular geography"
         },
         {  
            "pageid":5329197,
            "ns":0,
            "title":"Visual geography"
         },
         {  
            "pageid":716309,
            "ns":14,
            "title":"Category:Cartography"
         },
         {  
            "pageid":2021084,
            "ns":14,
            "title":"Category:Economic geography"
         },
         {  
            "pageid":2245786,
            "ns":14,
            "title":"Category:Geodemography"
         },
         {  
            "pageid":1111700,
            "ns":14,
            "title":"Category:Human geography"
         },
         {  
            "pageid":7774333,
            "ns":14,
            "title":"Category:Military geography"
         },
         {  
            "pageid":2153059,
            "ns":14,
            "title":"Category:Physical geography"
         },
         {  
            "pageid":1898464,
            "ns":14,
            "title":"Category:Political geography"
         },
         {  
            "pageid":6645804,
            "ns":14,
            "title":"Category:Regional geography"
         },
         {  
            "pageid":44706236,
            "ns":14,
            "title":"Category:Settlement geography"
         },
         {  
            "pageid":6517504,
            "ns":14,
            "title":"Category:Topography"
         },
         {  
            "pageid":1086902,
            "ns":14,
            "title":"Category:Toponymy"
         },
         {  
            "pageid":41335672,
            "ns":14,
            "title":"Category:Transportation geography"
         },
         {  
            "pageid":24727902,
            "ns":14,
            "title":"Category:Vernacular geography"
         }
      ]
   }
}

如果你能指出我如何解决这个问题的正确方向,我真的很感激。

感谢大家的帮助和指导。

3 个答案:

答案 0 :(得分:1)

要测试文件中的某行是否以“Category:”开头,您只需执行以下操作:

with open("file.txt", "r") as f:
    for line in f.read().splitlines():
        if line[0:8] == "Category":
            <here your code that writes "Category:" lines in a new file>
        else:
            <here your code that writes all other lines in a new file>

答案 1 :(得分:0)

你可以试试这个:

print " ".join(colored(element,"cyan") if element != "[S]" else colored(element,"green") if element != "[X]" else colored(element,"red") if element != "[H]" else colored(element,"magenta") for element in row)

这将逐行读取文件with open('file.txt', 'r') as f: data = [] category = [] lines = f.readlines() for line in lines: if line.startswith('Category'): category.append(line) else: data.append(line) cat_file = open('category.txt', 'w') data_file = open('data.txt', 'w') cat_file.write(''.join(category)) data_file.write(''.join(data)) cat_file.close() data_file.close() ,并测试它是否以&#34;类别&#34;开头。如果是这种情况,它会将行添加到file.txt数组,如果不是,则添加到category数组。

处理完文件后,程序会合并所有行并将它们写入category.txt和data.txt。

希望它会有所帮助。

答案 2 :(得分:0)

感谢leekaiinthesky告诉我使用'in'

f1 = open('List.text', 'r')
f2 = open('WordWithCat.text', 'w')
f3 = open('WordwithoutCat.text', 'w')
query = 'Category:'
lines = f1.read().splitlines()

for  line in lines:

    if query in line:
        f2.write(line+'\n')

    else:

        f3.write(line+'\n')
事实证明并没有我想象的那么复杂。谢谢大家的帮助和指导。