我正在使用Python 2.7,我有一个像这样的txt文件,我用python打开它:
TIME FLIGHT FROM AIRLINE AIRCRAFT STATUS
8:40 AM LH1334
Frankfurt (FRA)
Lufthansa A320 (D-AIPP)
Landed 8:40 AM
8:45 AM OK786
Prague (PRG)
Czech Airlines AT45 (OK-KFP)
Landed 8:32 AM
我想以正确的模式将它导出到csv到6列(时间,飞行,从,航空,飞机,状态),我想得到这个:
TIME FLIGHT FROM AIRLINE AIRCRAFT STATUS
Jul 21 8:40 AM LH1334 Frankfurt (FRA) Lufthansa A320 (D-AIPP) Landed 8:40 AM
...
对我来说有点困难,因为连续有多个单词,所以我没有任何有用的想法,我怎么能达到这种形式。
我的代码:
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
def to_2d(l,n):
return [l[i:i+n] for i in range(0, len(l), n)]
f = open('proba.txt', 'r')
x = f.read()
filename=r'output.csv'
resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
maindatatable = to_2d(x, 6)
print maindatatable
output.writerows(x)
resultcsv.close()
答案 0 :(得分:0)
看起来他们分为4行。
我们可以处理第一行
8:40 AM LH1334
如下:
import re
matches = re.match('(\d{1,2}:\d{2} [APM]{2}) (\w+\d+)', line)
time = matches.group(1)
flight = matches.group(2)
编辑:这一点太过分了。有一个标签将它们分开,所以它实际上很容易:
time, flight = line.split('\t')
第二行:
Frankfurt (FRA)
很简单:
from_ = line
第三行:
Lufthansa A320 (D-AIPP)
可以处理:
airline, aircraft = line.split('\t')
第四行:
Landed 8:40 AM
也很简单:
status = line
总而言之,您可以分别以四行为单位处理它们:
from itertools import islice
with open('my.txt') as f:
header = f.readline() # skip header
while True:
# read four lines
lines = list(islice(f, 4))
if len(lines) < 4:
break
time, flight = lines[0].split('\t')
from_ = lines[1]
airline, aircraft = lines[2].split('\t')
status = lines[3]
# Output a row into your csv file here