我正在尝试编写一个python脚本,它采用以下类型的信息:
http://ucolick.org/calendar/keckcal2009-20/keck2012.12dec
和
http://ucolick.org/calendar/keckcal2009-20/keck2012.18dec(完整数据转载如下)。正如您所看到的那样,机器已生成(并且这两个文件包含的数据略有不同)。此外,除了空格外,有几列没有任何内容。
我最终想要的是像词典
astro_times_dict['DEC 01']['TWILIGHT ENDS']['12'] = '18:33'
astro_times_dict['DEC 01']['TWILIGHT ENDS']['18'] = '19:00'
但我不确定一些明确的方法,这不是手工。我开始时:
for line in open('keck2012.12dec.txt').readlines():
if len(line.split()) > 15:
print line,
这将仅打印数据部分,但如何处理(有时)丢失的数据对我来说并不清楚。
以下是其中一个链接文件的全部内容:
KECK OBSERVATORY CALENDAR FOR 2012 -ASTRONOMICAL
(computed for altitude 4160.0 m)
ASTRONOMICAL(18 deg) TWILIGHT/DAWN MOON(midnight)
DATE(HST) SUN TWILIGHT ENDS MOON MOON DAWN BEGINS SUN SIDEREAL TIMES NIGHT (18 deg) Zenith
2012 SET RISE SET RISE TWI MID DAWN LENGTH DARK___ RA DEC Dist
12 18 18 12 18 18 h h % h m d m deg
SAT DEC 01 17 53 18 33 19 00 20 23 05 23 05 50 06 30 23 24 04 25 09 49 10.4 1.4 13 0735 1712 45
SUN DEC 02 17 53 18 33 19 00 21 15 05 24 05 51 06 31 23 28 04 29 09 54 10.4 2.2 21 0824 1416 56
MON DEC 03 17 53 18 33 19 00 22 06 05 24 05 51 06 32 23 32 04 33 09 57 10.4 3.1 29 0912 1041 68
TUE DEC 04 17 53 18 33 19 00 22 59 05 25 05 52 06 32 23 36 04 37 10 02 10.4 4.0 38 1000 0633 79
WED DEC 05 17 53 18 34 19 01 23 51 05 25 05 52 06 33 23 41 04 40 10 06 10.4 4.8 46 1045 0145 >90
THU DEC 06 17 53 18 34 19 01 00 46 05 26 05 53 06 33 23 45 04 44 10 11 10.4 5.8 55 1138-0237 >90
FRI DEC 07 17 54 18 34 19 01 01 42 05 27 05 54 06 34 23 49 04 48 10 16 10.4 6.7 64 1229-0719 >90
SAT DEC 08 17 54 18 34 19 01 02 42 14 02 05 27 05 54 06 35 23 52 04 52 10 20 10.4 7.7 73 1323-1146 >90
SUN DEC 09 17 54 18 35 19 02 03 44 14 49 05 28 05 55 06 35 23 57 04 56 10 25 10.4 8.7 83 1420-1541 >90
MON DEC 10 17 55 18 35 19 02 04 49 15 41 05 28 05 55 06 36 00 01 05 00 10 29 10.4 9.8 93 1520-1842 >90
TUE DEC 11 17 55 18 35 19 03 05 55 16 39 05 29 05 56 06 36 00 06 05 04 10 34 10.4 10.4 100 1624-2030 >90
WED DEC 12 17 55 18 36 19 03 06 59 17 42 05 29 05 56 06 37 00 10 05 08 10 38 10.4 10.4 100 1728-2052 >90
THU DEC 13 17 56 18 36 19 03 18 47 05 30 05 57 06 38 00 14 05 12 10 43 10.5 10.5 100 1831-1945 >90
FRI DEC 14 17 56 18 37 19 04 19 52 05 30 05 58 06 38 00 19 05 16 10 47 10.4 9.6 92 1933-1718 >90
SAT DEC 15 17 56 18 37 19 04 20 55 05 31 05 58 06 39 00 23 05 20 10 52 10.4 8.6 82 2031-1349 >90
SUN DEC 16 17 57 18 37 19 05 21 56 05 32 05 59 06 39 00 28 05 24 10 57 10.4 7.6 72 2125-0938 >90
MON DEC 17 17 57 18 38 19 05 22 52 05 32 05 59 06 40 00 32 05 28 11 01 10.4 6.7 63 2217-0506 >90
TUE DEC 18 17 58 18 38 19 05 23 47 05 33 06 00 06 40 00 36 05 32 11 06 10.5 5.8 55 2306-0028 >90
WED DEC 19 17 58 18 39 19 06 00 39 05 33 06 00 06 41 00 41 05 36 11 10 10.4 4.9 46 2355 0401 84
THU DEC 20 17 59 18 39 19 06 01 30 05 34 06 01 06 41 00 45 05 40 11 14 10.5 4.1 38 0041 0818 73
FRI DEC 21 17 59 18 40 19 07 02 21 05 34 06 01 06 42 00 50 05 44 11 18 10.4 3.2 30 0129 1209 61
SAT DEC 22 18 00 18 40 19 07 03 11 05 35 06 02 06 42 00 54 05 47 11 23 10.5 2.4 22 0216 1527 50
SUN DEC 23 18 00 18 41 19 08 14 22 04 02 05 35 06 02 06 43 00 59 05 51 11 27 10.5 1.6 14 0305 1804 39
MON DEC 24 18 01 18 41 19 08 15 05 04 52 05 36 06 03 06 43 01 03 05 55 11 32 10.5 0.7 7 0355 1954 28
TUE DEC 25 18 01 18 42 19 09 15 50 05 41 05 36 06 03 06 44 01 07 05 59 11 36 10.5 0.0 0 0447 2050 17
WED DEC 26 18 02 18 42 19 10 16 38 06 29 05 36 06 04 06 44 01 12 06 03 11 40 10.4 0.0 0 0538 2049 06
THU DEC 27 18 02 18 43 19 10 17 28 05 37 06 04 06 44 01 16 06 07 11 45 10.5 0.0 0 0630 1950 05
FRI DEC 28 18 03 18 44 19 11 18 19 05 37 06 04 06 45 01 21 06 11 11 49 10.4 0.0 0 0721 1757 17
SAT DEC 29 18 04 18 44 19 11 19 11 05 38 06 05 06 45 01 25 06 15 11 54 10.5 0.0 0 0811 1513 28
SUN DEC 30 18 04 18 45 19 12 20 03 05 38 06 05 06 46 01 30 06 19 11 58 10.4 0.8 8 0900 1147 40
MON DEC 31 18 05 18 45 19 12 20 55 05 38 06 06 06 46 01 34 06 23 12 02 10.4 1.7 16 0949 0746 51
ONE LINE REFERS TO EVENING DATE LAST QUARTER Dec 06 15:32 UT
AND FOLLOWING MORNING. NEW MOON Dec 13 08:41 UT
All dates and times are zone HST FIRST QUARTER Dec 20 05:17 UT
in upper table (except sid time). FULL MOON Dec 28 10:22 UT
答案 0 :(得分:1)
这是我到目前为止所做的(基于迄今为止的评论)。
for line in open(filename).readlines():
if len(line.split()) > 15:
print line.strip().replace(' ', ' ').split(' ')
哪个输出:
['SAT DEC 01', '17 53', '18 33', '19 00', '20 23', '', ' 05 23', '05 50', '06 30', '22 57', '04 25', '10 16', ' 11.3', '1.8', '16', '0735 1712', ' 45']
['SUN DEC 02', '17 53', '18 33', '19 00', '21 15', '', ' 05 24', '05 51', '06 31', '23 01', '04 29', '10 21', ' 11.3', '2.7', '23', '0824 1416', ' 56']
['MON DEC 03', '17 53', '18 33', '19 00', '22 06', '', ' 05 24', '05 51', '06 32', '23 05', '04 33', '10 25', ' 11.3', '3.6', '31', '0912 1041', ' 68']
['TUE DEC 04', '17 53', '18 33', '19 00', '22 59', '', ' 05 25', '05 52', '06 32', '23 09', '04 37', '10 29', ' 11.3', '4.4', '39', '1000 0633', ' 79']
['WED DEC 05', '17 53', '18 34', '19 01', '23 51', '', ' 05 25', '05 52', '06 33', '23 14', '04 40', '10 33', ' 11.3', '5.3', '46', '1045 0145', '>90']
['THU DEC 06', '17 53', '18 34', '19 01', '00 46', '', ' 05 26', '05 53', '06 33', '23 17', '04 44', '10 38', ' 11.3', '6.2', '54', '1138-0237', '>90']
['FRI DEC 07', '17 54', '18 34', '19 01', '01 42', '', ' 05 27', '05 54', '06 34', '23 21', '04 48', '10 43', ' 11.3', '7.1', '62', '1229-0719', '>90']
['SAT DEC 08', '17 54', '18 34', '19 01', '02 42', '14 02', '05 27', '05 54', '06 35', '23 25', '04 52', '10 47', ' 11.3', '8.1', '71', '1323-1146', '>90']
['SUN DEC 09', '17 54', '18 35', '19 02', '03 44', '14 49', '05 28', '05 55', '06 35', '23 30', '04 56', '10 52', ' 11.3', '9.1', '80', '1420-1541', '>90']
['MON DEC 10', '17 55', '18 35', '19 02', '04 49', '15 41', '05 28', '05 55', '06 36', '23 34', '05 00', '10 56', ' 11.3 10.2', '90', '1520-1842', '>90']
['TUE DEC 11', '17 55', '18 35', '19 03', '05 55', '16 39', '05 29', '05 56', '06 36', '23 38', '05 04', '11 01', ' 11.3 11.3', '99', '1624-2030', '>90']
['WED DEC 12', '17 55', '18 36', '19 03', '06 59', '17 42', '05 29', '05 56', '06 37', '23 43', '05 08', '11 05', ' 11.3 11.3 100', '1728-2052', '>90']
['THU DEC 13', '17 56', '18 36', '19 03', '', ' 18 47', '05 30', '05 57', '06 38', '23 47', '05 12', '11 10', ' 11.3 11.2', '98', '1831-1945', '>90']
['FRI DEC 14', '17 56', '18 37', '19 04', '', ' 19 52', '05 30', '05 58', '06 38', '23 52', '05 16', '11 15', ' 11.3 10.1', '88', '1933-1718', '>90']
['SAT DEC 15', '17 56', '18 37', '19 04', '', ' 20 55', '05 31', '05 58', '06 39', '23 56', '05 20', '11 19', ' 11.3', '9.1', '79', '2031-1349', '>90']
['SUN DEC 16', '17 57', '18 37', '19 05', '', ' 21 56', '05 32', '05 59', '06 39', '00 00', '05 24', '11 24', ' 11.4', '8.1', '70', '2125-0938', '>90']
['MON DEC 17', '17 57', '18 38', '19 05', '', ' 22 52', '05 32', '05 59', '06 40', '00 05', '05 28', '11 28', ' 11.4', '7.1', '62', '2217-0506', '>90']
['TUE DEC 18', '17 58', '18 38', '19 05', '', ' 23 47', '05 33', '06 00', '06 40', '00 09', '05 32', '11 33', ' 11.4', '6.2', '54', '2306-0028', '>90']
['WED DEC 19', '17 58', '18 39', '19 06', '', ' 00 39', '05 33', '06 00', '06 41', '00 14', '05 36', '11 37', ' 11.4', '5.4', '47', '2355 0401', ' 84']
['THU DEC 20', '17 59', '18 39', '19 06', '', ' 01 30', '05 34', '06 01', '06 41', '00 18', '05 40', '11 42', ' 11.4', '4.5', '39', '0041 0818', ' 73']
['FRI DEC 21', '17 59', '18 40', '19 07', '', ' 02 21', '05 34', '06 01', '06 42', '00 23', '05 44', '11 46', ' 11.4', '3.7', '32', '0129 1209', ' 61']
['SAT DEC 22', '18 00', '18 40', '19 07', '', ' 03 11', '05 35', '06 02', '06 42', '00 27', '05 47', '11 50', ' 11.4', '2.8', '25', '0216 1527', ' 50']
['SUN DEC 23', '18 00', '18 41', '19 08', '14 22', '04 02', '05 35', '06 02', '06 43', '00 32', '05 51', '11 54', ' 11.4', '2.0', '17', '0305 1804', ' 39']
['MON DEC 24', '18 01', '18 41', '19 08', '15 05', '04 52', '05 36', '06 03', '06 43', '00 35', '05 55', '11 59', ' 11.4', '1.2', '10', '0355 1954', ' 28']
['TUE DEC 25', '18 01', '18 42', '19 09', '15 50', '05 41', '05 36', '06 03', '06 44', '00 40', '05 59', '12 03', ' 11.3', '0.4', ' 3', '0447 2050', ' 17']
['WED DEC 26', '18 02', '18 42', '19 10', '16 38', '06 29', '05 36', '06 04', '06 44', '00 44', '06 03', '12 08', ' 11.4', '0.0', ' 0', '0538 2049', ' 06']
['THU DEC 27', '18 02', '18 43', '19 10', '17 28', '', ' 05 37', '06 04', '06 44', '00 49', '06 07', '12 12', ' 11.3', '0.0', ' 0', '0630 1950', ' 05']
['FRI DEC 28', '18 03', '18 44', '19 11', '18 19', '', ' 05 37', '06 04', '06 45', '00 54', '06 11', '12 16', ' 11.3', '0.0', ' 0', '0721 1757', ' 17']
['SAT DEC 29', '18 04', '18 44', '19 11', '19 11', '', ' 05 38', '06 05', '06 45', '00 58', '06 15', '12 21', ' 11.3', '0.4', ' 3', '0811 1513', ' 28']
['SUN DEC 30', '18 04', '18 45', '19 12', '20 03', '', ' 05 38', '06 05', '06 46', '01 03', '06 19', '12 25', ' 11.3', '1.3', '11', '0900 1147', ' 40']
['MON DEC 31', '18 05', '18 45', '19 12', '20 55', '', ' 05 38', '06 06', '06 46', '01 07', '06 23', '12 30', ' 11.3', '2.2', '19', '0949 0746', ' 51']
我认为正确识别没有数据的列;并将剩余的列“保持在一起”足以从这里轻松解析。如果其他人看到错误或更好的方式做这些事情,我会暂时保持开放状态。
答案 1 :(得分:0)
这是一个pyparsing解决方案:
data = """\
KECK OBSERVATORY CALENDAR FOR 2012 -ASTRONOMICAL
(computed for altitude 4160.0 m)
ASTRONOMICAL(18 deg) TWILIGHT/DAWN MOON(midnight)
DATE(HST) SUN TWILIGHT ENDS MOON MOON DAWN BEGINS SUN SIDEREAL TIMES NIGHT (18 deg) Zenith
2012 SET RISE SET RISE TWI MID DAWN LENGTH DARK___ RA DEC Dist
12 18 18 12 18 18 h h % h m d m deg
SAT DEC 01 17 53 18 33 19 00 20 23 05 23 05 50 06 30 23 24 04 25 09 49 10.4 1.4 13 0735 1712 45
SUN DEC 02 17 53 18 33 19 00 21 15 05 24 05 51 06 31 23 28 04 29 09 54 10.4 2.2 21 0824 1416 56
MON DEC 03 17 53 18 33 19 00 22 06 05 24 05 51 06 32 23 32 04 33 09 57 10.4 3.1 29 0912 1041 68
TUE DEC 04 17 53 18 33 19 00 22 59 05 25 05 52 06 32 23 36 04 37 10 02 10.4 4.0 38 1000 0633 79
WED DEC 05 17 53 18 34 19 01 23 51 05 25 05 52 06 33 23 41 04 40 10 06 10.4 4.8 46 1045 0145 >90
THU DEC 06 17 53 18 34 19 01 00 46 05 26 05 53 06 33 23 45 04 44 10 11 10.4 5.8 55 1138-0237 >90
FRI DEC 07 17 54 18 34 19 01 01 42 05 27 05 54 06 34 23 49 04 48 10 16 10.4 6.7 64 1229-0719 >90
SAT DEC 08 17 54 18 34 19 01 02 42 14 02 05 27 05 54 06 35 23 52 04 52 10 20 10.4 7.7 73 1323-1146 >90
SUN DEC 09 17 54 18 35 19 02 03 44 14 49 05 28 05 55 06 35 23 57 04 56 10 25 10.4 8.7 83 1420-1541 >90
MON DEC 10 17 55 18 35 19 02 04 49 15 41 05 28 05 55 06 36 00 01 05 00 10 29 10.4 9.8 93 1520-1842 >90
TUE DEC 11 17 55 18 35 19 03 05 55 16 39 05 29 05 56 06 36 00 06 05 04 10 34 10.4 10.4 100 1624-2030 >90
WED DEC 12 17 55 18 36 19 03 06 59 17 42 05 29 05 56 06 37 00 10 05 08 10 38 10.4 10.4 100 1728-2052 >90
THU DEC 13 17 56 18 36 19 03 18 47 05 30 05 57 06 38 00 14 05 12 10 43 10.5 10.5 100 1831-1945 >90
FRI DEC 14 17 56 18 37 19 04 19 52 05 30 05 58 06 38 00 19 05 16 10 47 10.4 9.6 92 1933-1718 >90
""".splitlines()
from pyparsing import *
weekday = oneOf("SUN MON TUE WED THU FRI SAT")
month = oneOf("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC")
integer = Word(nums)
real = Regex(r'\d+\.\d*')
time = Regex(r'\d\d \d\d').leaveWhitespace()
time.setParseAction(lambda t: ':'.join(t[0].split()))
spacer = White(' ', exact=2).suppress()
blankTime = White(' ', exact=5)
blankTime.setParseAction(replaceWith(None))
dataParser = (weekday("weekday") + month("month") + integer("mday") + spacer +
(((time|blankTime) + spacer).leaveWhitespace()*11)('times') +
real('twi_len') + real('dawn_len') + integer('twi_pct') +
restOfLine)
fields = """sunset twilight ends moonrise moonset dawn
begins sunrise astro_twi astro_mid astro_dawn""".split()
def labelTimes(tokens):
"""parse-time transform to add results names for each time field"""
for fname, value in zip(fields, tokens.times):
# assign results name for each field
tokens[fname] = value
# no longer need this name, delete it
del tokens['times']
dataParser.setParseAction(labelTimes)
for line in data[6:]:
print (line)
vals = dataParser.parseString(line)
# uncomment this line to see all field names
# print vals.dump()
print vals.moonrise, vals.moonset
print
打印:
SAT DEC 01 17 53 18 33 19 00 20 23 05 23 05 50 06 30 23 24 04 25 09 49 10.4 1.4 13 0735 1712 45
20:23 None
SUN DEC 02 17 53 18 33 19 00 21 15 05 24 05 51 06 31 23 28 04 29 09 54 10.4 2.2 21 0824 1416 56
21:15 None
MON DEC 03 17 53 18 33 19 00 22 06 05 24 05 51 06 32 23 32 04 33 09 57 10.4 3.1 29 0912 1041 68
22:06 None
TUE DEC 04 17 53 18 33 19 00 22 59 05 25 05 52 06 32 23 36 04 37 10 02 10.4 4.0 38 1000 0633 79
22:59 None
etc.
Pyparsing返回一个ParseResults数据结构,该结构可以用作简单列表,或者作为带有键的dict或带有属性的对象(如果给任何解析器元素赋予了名称)。在示例代码中,我将展示如何使用字段名称来访问moonrise和moonset的已解析数据值。取消对vals.dump()的调用,以查看每行的所有有效字段名称和值。
Pyparsing的默认行为是在匹配解析器的元素时隐式跳过空格,因此我们必须在解析器的选定部分上调用leaveWhitespace
来禁用它。在您给定的数据集中,看起来月亮和月落时间是唯一可能为空的,但此解析器将检测到任何丢失的时间,并将其报告为无。 (我不确定最右边的场地是什么,留给OP做练习。)
答案 2 :(得分:0)
The struct
module对于解析固定宽度数据非常有用:
import struct
line = ' SAT DEC 01 17 53 18 33 19 00 20 23 05 23 05 50 06 30 23 24 04 25 09 49 10.4 1.4 13 0735 1712 45'
cols = [s.strip() for s in struct.unpack('5s8s' + 11 * '7s' + '5s5s4s11s5s', line)]
# ['SAT', 'DEC 01', '17 53', '18 33', '19 00', '20 23', '', '05 23', '05 50', '06 30', '23 24', '04 25', '09 49', '10.4', '1.4', '13', '0735 1712', '45']