解析具有未完全分离的列的txt文件

时间:2017-11-25 15:42:26

标签: python

如何阅读txt/data文件,其中有许多列未按行分隔(请参阅下面的文件)。

我想只从文件的每一行中提取一些所需的参数。

文件内容:

version=2
id  NumCompo    Species QuantumNumbers  Frequency   Eup Gup Aij FitFreq DeltaFitFreq    Vo  deltaVo FWHM_G  deltaFWHM_G FWHM_L  deltaFWHM_L Intensity   deltaIntensity  FitFlux deltaFitFlux    Freq.IntensityMax   V.IntensityMax  FWHM    IntensityMax    Flux1stMom  deltaFlux1stMom rms deltaV  Cal Size    TelescopePath   TelescopeName
None    None    None    None    MHz K   None    s-1 MHz MHz km/s    km/s    km/s    km/s    km/s    km/s    K   K   K.km/s  K.km/s  MHz km/s    km/s    K   K.km/s  K.km/s  mK  km/s    %   arcsec  None    None
44003   1   CH3CHO  (18 1 17 2 _ 17 1 16 2) 350362.8435 163.4598498101699   74  0.0014741376966675667   350355.5848769065   Infinity    6.210933891166498   Infinity    1.739817511288065   Infinity    0.0 0.0 2.8623661141900496  Infinity    5.301075803032265   0.0 350355.5    6.2835599041722 1.7199802504879502  2.848570585251  4.899485148752622   0.0 3.854414571567953E-5    0.8289084198960507  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m
44003   1   CH3CHO  (18 1 17 0 _ 17 1 16 0) 350445.7777 163.41850853869101  74  0.0014742735891251069   350437.70831029414  0.12591692133973328 6.903042719892636   0.10771941048880102 2.203766561226652   0.2947187307742186  0.0 0.0 3.482121868378891   0.34484851265307565 8.168010359688692   0.0 350437.5    7.08124391130517    2.11135392209597    3.597269296646  7.595108638308943   0.0 204.05560763773454  0.8287143946045267  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m
44003   1   CH3CHO  (13 2 12 1 _ 12 1 11 2) 350572.1804 93.02188980281947   54  9.699686188970169E-5    350566.94541642064  NaN 4.476706032550229   NaN 0.4589727274204179  NaN 0.0 0.0 23.273694629520087  NaN 11.372220042377085  0.0 350566.40625    4.9377752090695495  1.3912897271407283  1.418276190758  1.9732330944498893  0.0 425.46913502384274  0.8284143332887641  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m
44003   1   CH3CHO  (8 6 3 3 _ 9 5 4 3) 350808.1122 318.0348265703963   34  1.075967688918428E-5    350801.2794264813   Infinity    5.839129418307394   Infinity    0.565736741450577   Infinity    0.0 0.0 11.264418715889377  Infinity    6.784303066688616   0.0 350801.75   5.436988227894669   1.3717772790578981  2.775228977203  3.806996055110165   0.0 156.0928146251678   0.8412065022978162  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m
44003   1   CH3CHO  (8 6 2 3 _ 9 5 5 3) 350808.1275 318.03482730468073  34  1.0759677371497151E-5   350801.2795084328   Infinity    5.852134153594954   Infinity    0.5663191176333013  Infinity    0.0 0.0 11.228477212030814  Infinity    6.769618049955546   0.0 350801.75   5.450063014554457   1.3717772790578981  2.775228977203  3.806996055110165   0.0 156.8862242548636   0.8412065022978162  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m

2 个答案:

答案 0 :(得分:0)

如果scipy.optimize.minimize列中也不包含空格,您可以只是在空格中分隔行。

为此,您可以使用a regex to replace all spaces withing brackets with commas

因此您可以执行以下操作:

QuantumNumbers

首先将import re with open('f.txt') as fh: s = re.sub(r' (?=[^\(\)]*\))', ',', fh.read().strip()) rows = [r.split() for r in s.split('\n')] 计算为:

s

,然后version=2 id NumCompo Species QuantumNumbers Frequency Eup Gup Aij FitFreq DeltaFitFreq Vo deltaVo FWHM_G deltaFWHM_G FWHM_L deltaFWHM_L Intensity deltaIntensity FitFlux deltaFitFlux Freq.IntensityMax V.IntensityMax FWHM IntensityMax Flux1stMom deltaFlux1stMom rms deltaV Cal Size TelescopePath TelescopeName None None None None MHz K None s-1 MHz MHz km/s km/s km/s km/s km/s km/s K K K.km/s K.km/s MHz km/s km/s K K.km/s K.km/s mK km/s % arcsec None None 44003 1 CH3CHO (18,1,17,2,_,17,1,16,2) 350362.8435 163.4598498101699 74 0.0014741376966675667 350355.5848769065 Infinity 6.210933891166498 Infinity 1.739817511288065 Infinity 0.0 0.0 2.8623661141900496 Infinity 5.301075803032265 0.0 350355.5 6.2835599041722 1.7199802504879502 2.848570585251 4.899485148752622 0.0 3.854414571567953E-5 0.8289084198960507 0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/ alma_400m 44003 1 CH3CHO (18,1,17,0,_,17,1,16,0) 350445.7777 163.41850853869101 74 0.0014742735891251069 350437.70831029414 0.12591692133973328 6.903042719892636 0.10771941048880102 2.203766561226652 0.2947187307742186 0.0 0.0 3.482121868378891 0.34484851265307565 8.168010359688692 0.0 350437.5 7.08124391130517 2.11135392209597 3.597269296646 7.595108638308943 0.0 204.05560763773454 0.8287143946045267 0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/ alma_400m 44003 1 CH3CHO (13,2,12,1,_,12,1,11,2) 350572.1804 93.02188980281947 54 9.699686188970169E-5 350566.94541642064 NaN 4.476706032550229 NaN 0.4589727274204179 NaN 0.0 0.0 23.273694629520087 NaN 11.372220042377085 0.0 350566.40625 4.9377752090695495 1.3912897271407283 1.418276190758 1.9732330944498893 0.0 425.46913502384274 0.8284143332887641 0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/ alma_400m 44003 1 CH3CHO (8,6,3,3,_,9,5,4,3) 350808.1122 318.0348265703963 34 1.075967688918428E-5 350801.2794264813 Infinity 5.839129418307394 Infinity 0.565736741450577 Infinity 0.0 0.0 11.264418715889377 Infinity 6.784303066688616 0.0 350801.75 5.436988227894669 1.3717772790578981 2.775228977203 3.806996055110165 0.0 156.0928146251678 0.8412065022978162 0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/ alma_400m 44003 1 CH3CHO (8,6,2,3,_,9,5,5,3) 350808.1275 318.03482730468073 34 1.0759677371497151E-5 350801.2795084328 Infinity 5.852134153594954 Infinity 0.5663191176333013 Infinity 0.0 0.0 11.228477212030814 Infinity 6.769618049955546 0.0 350801.75 5.450063014554457 1.3717772790578981 2.775228977203 3.806996055110165 0.0 156.8862242548636 0.8412065022978162 0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/ alma_400m 作为列表列表:

rows

然后可以使用语法[['version=2'] ['id', 'NumCompo', 'Species', 'QuantumNumbers', 'Frequency', 'Eup', 'Gup', 'Aij', 'FitFreq', 'DeltaFitFreq', 'Vo', 'deltaVo', 'FWHM_G', 'deltaFWHM_G', 'FWHM_L', 'deltaFWHM_L', 'Intensity', 'deltaIntensity', 'FitFlux', 'deltaFitFlux', 'Freq.IntensityMax', 'V.IntensityMax', 'FWHM', 'IntensityMax', 'Flux1stMom', 'deltaFlux1stMom', 'rms', 'deltaV', 'Cal', 'Size', 'TelescopePath', 'TelescopeName'] ['None', 'None', 'None', 'None', 'MHz', 'K', 'None', 's-1', 'MHz', 'MHz', 'km/s', 'km/s', 'km/s', 'km/s', 'km/s', 'km/s', 'K', 'K', 'K.km/s', 'K.km/s', 'MHz', 'km/s', 'km/s', 'K', 'K.km/s', 'K.km/s', 'mK', 'km/s', '%', 'arcsec', 'None', 'None'] ['44003', '1', 'CH3CHO', '(18,1,17,2,_,17,1,16,2)', '350362.8435', '163.4598498101699', '74', '0.0014741376966675667', '350355.5848769065', 'Infinity', '6.210933891166498', 'Infinity', '1.739817511288065', 'Infinity', '0.0', '0.0', '2.8623661141900496', 'Infinity', '5.301075803032265', '0.0', '350355.5', '6.2835599041722', '1.7199802504879502', '2.848570585251', '4.899485148752622', '0.0', '3.854414571567953E-5', '0.8289084198960507', '0.0', '0.0', '/home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/', 'alma_400m'] ['44003', '1', 'CH3CHO', '(18,1,17,0,_,17,1,16,0)', '350445.7777', '163.41850853869101', '74', '0.0014742735891251069', '350437.70831029414', '0.12591692133973328', '6.903042719892636', '0.10771941048880102', '2.203766561226652', '0.2947187307742186', '0.0', '0.0', '3.482121868378891', '0.34484851265307565', '8.168010359688692', '0.0', '350437.5', '7.08124391130517', '2.11135392209597', '3.597269296646', '7.595108638308943', '0.0', '204.05560763773454', '0.8287143946045267', '0.0', '0.0', '/home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/', 'alma_400m'] ['44003', '1', 'CH3CHO', '(13,2,12,1,_,12,1,11,2)', '350572.1804', '93.02188980281947', '54', '9.699686188970169E-5', '350566.94541642064', 'NaN', '4.476706032550229', 'NaN', '0.4589727274204179', 'NaN', '0.0', '0.0', '23.273694629520087', 'NaN', '11.372220042377085', '0.0', '350566.40625', '4.9377752090695495', '1.3912897271407283', '1.418276190758', '1.9732330944498893', '0.0', '425.46913502384274', '0.8284143332887641', '0.0', '0.0', '/home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/', 'alma_400m'] ['44003', '1', 'CH3CHO', '(8,6,3,3,_,9,5,4,3)', '350808.1122', '318.0348265703963', '34', '1.075967688918428E-5', '350801.2794264813', 'Infinity', '5.839129418307394', 'Infinity', '0.565736741450577', 'Infinity', '0.0', '0.0', '11.264418715889377', 'Infinity', '6.784303066688616', '0.0', '350801.75', '5.436988227894669', '1.3717772790578981', '2.775228977203', '3.806996055110165', '0.0', '156.0928146251678', '0.8412065022978162', '0.0', '0.0', '/home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/', 'alma_400m'] ['44003', '1', 'CH3CHO', '(8,6,2,3,_,9,5,5,3)', '350808.1275', '318.03482730468073', '34', '1.0759677371497151E-5', '350801.2795084328', 'Infinity', '5.852134153594954', 'Infinity', '0.5663191176333013', 'Infinity', '0.0', '0.0', '11.228477212030814', 'Infinity', '6.769618049955546', '0.0', '350801.75', '5.450063014554457', '1.3717772790578981', '2.775228977203', '3.806996055110165', '0.0', '156.8862242548636', '0.8412065022978162', '0.0', '0.0', '/home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/', 'alma_400m']] 进行索引。

答案 1 :(得分:0)

看起来您有三行标题,后跟记录,这些记录由不同的空格分隔,并且本身不包含空格,但QuantumNumbers字段除外,该字段的括号内包含九个数字。您可以通过以下方法解决此问题:

  • 只看第四行
  • 在空白处分割每一行
  • 然后重新加入量子数

像这样:

import re

text = """version=2
id  NumCompo    Species QuantumNumbers  Frequency   Eup Gup Aij FitFreq DeltaFitFreq    Vo  deltaVo FWHM_G  deltaFWHM_G FWHM_L  deltaFWHM_L Intensity   deltaIntensity  FitFlux deltaFitFlux    Freq.IntensityMax   V.IntensityMax  FWHM    IntensityMax    Flux1stMom  deltaFlux1stMom rms deltaV  Cal Size    TelescopePath   TelescopeName
None    None    None    None    MHz K   None    s-1 MHz MHz km/s    km/s    km/s    km/s    km/s    km/s    K   K   K.km/s  K.km/s  MHz km/s    km/s    K   K.km/s  K.km/s  mK  km/s    %   arcsec  None    None
44003   1   CH3CHO  (18 1 17 2 _ 17 1 16 2) 350362.8435 163.4598498101699   74  0.0014741376966675667   350355.5848769065   Infinity    6.210933891166498   Infinity    1.739817511288065   Infinity    0.0 0.0 2.8623661141900496  Infinity    5.301075803032265   0.0 350355.5    6.2835599041722 1.7199802504879502  2.848570585251  4.899485148752622   0.0 3.854414571567953E-5    0.8289084198960507  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m
44003   1   CH3CHO  (18 1 17 0 _ 17 1 16 0) 350445.7777 163.41850853869101  74  0.0014742735891251069   350437.70831029414  0.12591692133973328 6.903042719892636   0.10771941048880102 2.203766561226652   0.2947187307742186  0.0 0.0 3.482121868378891   0.34484851265307565 8.168010359688692   0.0 350437.5    7.08124391130517    2.11135392209597    3.597269296646  7.595108638308943   0.0 204.05560763773454  0.8287143946045267  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m
44003   1   CH3CHO  (13 2 12 1 _ 12 1 11 2) 350572.1804 93.02188980281947   54  9.699686188970169E-5    350566.94541642064  NaN 4.476706032550229   NaN 0.4589727274204179  NaN 0.0 0.0 23.273694629520087  NaN 11.372220042377085  0.0 350566.40625    4.9377752090695495  1.3912897271407283  1.418276190758  1.9732330944498893  0.0 425.46913502384274  0.8284143332887641  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m
44003   1   CH3CHO  (8 6 3 3 _ 9 5 4 3) 350808.1122 318.0348265703963   34  1.075967688918428E-5    350801.2794264813   Infinity    5.839129418307394   Infinity    0.565736741450577   Infinity    0.0 0.0 11.264418715889377  Infinity    6.784303066688616   0.0 350801.75   5.436988227894669   1.3717772790578981  2.775228977203  3.806996055110165   0.0 156.0928146251678   0.8412065022978162  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m
44003   1   CH3CHO  (8 6 2 3 _ 9 5 5 3) 350808.1275 318.03482730468073  34  1.0759677371497151E-5   350801.2795084328   Infinity    5.852134153594954   Infinity    0.5663191176333013  Infinity    0.0 0.0 11.228477212030814  Infinity    6.769618049955546   0.0 350801.75   5.450063014554457   1.3717772790578981  2.775228977203  3.806996055110165   0.0 156.8862242548636   0.8412065022978162  0.0 0.0 /home/dipen/Downloads/cassis3.9-160426-build6032/delivery/telescope/    alma_400m"""

records = text
lines = records.split("\n")

fields = []
WHITESPACE = re.compile(r"\s+")

for line in lines[3:]:
    current = WHITESPACE.split(line)
    current[3:12] = [" ".join(current[3:12])]
    fields.append(current)

print(fields)