将复杂文本文件解析为一行字段名称和第二行值

时间:2013-03-31 18:51:51

标签: linux parsing

我正在尝试解析一个文本文件,该文件包含主要包含文本和单个数字的行(在每行的开头加上“#”)。文件的第二部分由具有多个数字的行组成,所有行都与单个结构相关。由于我需要将这些输出文件组合成几百个案例,如果我可以将这些文件中的每一个处理成一行数据,那将会很有帮助。我在使用bash / perl / awk的组合时遇到了麻烦。任何人都可以建议我可以这样做吗? (下面的示例文件)。

感谢您的考虑。

祝福,

-S

# Title Segmentation Statistics
#
# generating_program mri_segstats
# cvs_version $Id: mri_segstats.c,v 1.75.2.9 2013/02/16 00:09:33 greve Exp $
# cmdline mri_segstats --seg mri/aseg.mgz --sum stats/aseg.stats --pv mri/norm.mgz --empty --brainmask mri/brainmask.mgz --brain-vol-from-seg --excludeid 0 --excl-ctxgmwm --supratent --subcortgray --in mri/norm.mgz --in-intensity-name norm --in-intensity-units MR --etiv --surf-wm-vol --surf-ctx-vol --totalgray --euler --ctab /mnt/glusterfs/salsoman/freesurfer/ASegStatsLUT.txt --subject WCA_0162_T1_FS
# sysname  Linux
# hostname barley15.stanford.edu
# machine  x86_64
# user     salsoman
# anatomy_type volume
#
# SUBJECTS_DIR /mnt/glusterfs/salsoman/output/FS
# subjectname WCA_0162_T1_FS
# Measure BrainSeg, BrainSegVol, Brain Segmentation Volume, 1089921.000000, mm^3
# Measure BrainSegNotVent, BrainSegVolNotVent, Brain Segmentation Volume Without Ventricles, 993734.000000, mm^3
# Measure BrainSegNotVentSurf, BrainSegVolNotVentSurf, Brain Segmentation Volume Without Ventricles from Surf, 993214.631437, mm^3
# Measure lhCortex, lhCortexVol, Left hemisphere cortical gray matter volume, 240339.518738, mm^3
# Measure rhCortex, rhCortexVol, Right hemisphere cortical gray matter volume, 236468.599276, mm^3
# Measure Cortex, CortexVol, Total cortical gray matter volume, 476808.118013, mm^3
# Measure lhCorticalWhiteMatter, lhCorticalWhiteMatterVol, Left hemisphere cortical white matter volume, 191135.667925, mm^3
# Measure rhCorticalWhiteMatter, rhCorticalWhiteMatterVol, Right hemisphere cortical white matter volume, 180013.845498, mm^3
# Measure CorticalWhiteMatter, CorticalWhiteMatterVol, Total cortical white matter volume, 371149.513423, mm^3
# Measure SubCortGray, SubCortGrayVol, Subcortical gray matter volume, 52383.000000, mm^3
# Measure TotalGray, TotalGrayVol, Total gray matter volume, 604954.118013, mm^3
# Measure SupraTentorial, SupraTentorialVol, Supratentorial volume, 991108.631437, mm^3
# Measure SupraTentorialNotVent, SupraTentorialVolNotVent, Supratentorial volume, 902611.631437, mm^3
# Measure SupraTentorialNotVentVox, SupraTentorialVolNotVentVox, Supratentorial volume voxel count, 900542.000000, mm^3
# Measure Mask, MaskVol, Mask Volume, 1694747.000000, mm^3
# Measure BrainSegVol-to-eTIV, BrainSegVol-to-eTIV, Ratio of BrainSegVol to eTIV, 0.624390, unitless
# Measure MaskVol-to-eTIV, MaskVol-to-eTIV, Ratio of MaskVol to eTIV, 0.970881, unitless
# Measure lhSurfaceHoles, lhSurfaceHoles, Number of defect holes in lh surfaces prior to fixing, 239, unitless
# Measure rhSurfaceHoles, rhSurfaceHoles, Number of defect holes in rh surfaces prior to fixing, 227, unitless
# Measure SurfaceHoles, SurfaceHoles, Total number of defect holes in surfaces prior to fixing, 466, unitless
# Measure EstimatedTotalIntraCranialVol, eTIV, Estimated Total Intracranial Volume, 1745576.756023, mm^3
# SegVolFile mri/aseg.mgz
# SegVolFileTimeStamp  2013/03/27 19:34:08
# ColorTable /mnt/glusterfs/salsoman/freesurfer/ASegStatsLUT.txt
# ColorTableTimeStamp 2013/02/25 22:23:16
# InVolFile  mri/norm.mgz
# InVolFileTimeStamp  2013/03/27 14:00:28
# InVolFrame 0
# PVVolFile  mri/norm.mgz
# PVVolFileTimeStamp  2013/03/27 14:00:28
# Excluding Cortical Gray and White Matter
# ExcludeSegId 0 2 3 41 42
# VoxelVolume_mm3 1
# TableCol  1 ColHeader Index
# TableCol  1 FieldName Index
# TableCol  1 Units     NA
# TableCol  2 ColHeader SegId
# TableCol  2 FieldName Segmentation Id
# TableCol  2 Units     NA
# TableCol  3 ColHeader NVoxels
# TableCol  3 FieldName Number of Voxels
# TableCol  3 Units     unitless
# TableCol  4 ColHeader Volume_mm3
# TableCol  4 FieldName Volume
# TableCol  4 Units     mm^3
# TableCol  5 ColHeader StructName
# TableCol  5 FieldName Structure Name
# TableCol  5 Units     NA
# TableCol  6 ColHeader normMean
# TableCol  6 FieldName Intensity normMean
# TableCol  6 Units     MR
# TableCol  7 ColHeader normStdDev
# TableCol  7 FieldName Itensity normStdDev
# TableCol  7 Units     MR
# TableCol  8 ColHeader normMin
# TableCol  8 FieldName Intensity normMin
# TableCol  8 Units     MR
# TableCol  9 ColHeader normMax
# TableCol  9 FieldName Intensity normMax
# TableCol  9 Units     MR
# TableCol 10 ColHeader normRange
# TableCol 10 FieldName Intensity normRange
# TableCol 10 Units     MR
# NRows 45
# NTableCols 10
# ColHeaders  Index SegId NVoxels Volume_mm3 StructName normMean normStdDev normMin normMax normRange
  1   4     41962    41962.4  Left-Lateral-Ventricle            22.0753    10.2057     3.0000    94.0000    91.0000
  2   5      2150     2149.7  Left-Inf-Lat-Vent                 37.5636    16.3886     5.0000    89.0000    84.0000
  3   7      8273     8273.3  Left-Cerebellum-White-Matter      88.0903    11.6908    21.0000   123.0000   102.0000
  4   8     35427    35427.4  Left-Cerebellum-Cortex            56.4255    12.5475     2.0000    92.0000    90.0000
  5  10      6087     6086.7  Left-Thalamus-Proper              92.2098    11.7928    50.0000   124.0000    74.0000
  6  11      5101     5100.7  Left-Caudate                      75.0335     9.9708    29.0000   100.0000    71.0000
  7  12      4773     4773.0  Left-Putamen                      75.7113     6.2195    48.0000    95.0000    47.0000
  8  13      1178     1177.6  Left-Pallidum                     86.3354     6.2568    59.0000   104.0000    45.0000
  9  14      2973     2973.1  3rd-Ventricle                     27.5508    11.3394     9.0000    77.0000    68.0000
 10  15      2403     2403.0  4th-Ventricle                     26.8237    11.9581     6.0000    79.0000    73.0000
 11  16     18347    18347.2  Brain-Stem                        82.1731    12.0144    15.0000   116.0000   101.0000
 12  17      3824     3824.2  Left-Hippocampus                  66.7333     8.6661    26.0000   100.0000    74.0000
 13  18      2087     2087.1  Left-Amygdala                     63.9856     7.2932    37.0000    91.0000    54.0000
 14  24      2094     2094.0  CSF                               36.2929    14.6972    12.0000    90.0000    78.0000
 15  26       340      340.0  Left-Accumbens-area               69.8967     8.7139    37.0000    87.0000    50.0000
 16  28      2969     2969.5  Left-VentralDC                    94.9737    13.6527    44.0000   122.0000    78.0000
 17  30        76       75.9  Left-vessel                       58.3205    11.6736    27.0000    80.0000    53.0000
 18  31      1103     1102.6  Left-choroid-plexus               51.7182    16.3692    12.0000   100.0000    88.0000
 19  43     38108    38108.2  Right-Lateral-Ventricle           20.2269    10.2570     0.0000    92.0000    92.0000
 20  44      2165     2165.0  Right-Inf-Lat-Vent                30.2048    13.6808     0.0000    80.0000    80.0000
 21  46      9715     9715.4  Right-Cerebellum-White-Matter     86.9395     8.3909    25.0000   115.0000    90.0000
 22  47     41688    41688.2  Right-Cerebellum-Cortex           57.5291    10.3208     9.0000    91.0000    82.0000
 23  49      4769     4769.3  Right-Thalamus-Proper             82.0576    12.2446    18.0000   106.0000    88.0000
 24  50      4587     4587.1  Right-Caudate                     69.9613    12.7863    11.0000   103.0000    92.0000
 25  51      4694     4694.4  Right-Putamen                     69.9372     7.9141    48.0000    91.0000    43.0000
 26  52      1407     1406.8  Right-Pallidum                    88.0501     5.7841    57.0000   105.0000    48.0000
 27  53      3160     3159.6  Right-Hippocampus                 63.3511     8.9283    17.0000    95.0000    78.0000
 28  54      1877     1877.4  Right-Amygdala                    57.3686     8.5163    20.0000    83.0000    63.0000
 29  58       376      376.0  Right-Accumbens-area              70.4901     9.9104    41.0000    96.0000    55.0000
 30  60      2973     2972.7  Right-VentralDC                   89.6143    14.1755    29.0000   120.0000    91.0000
 31  62       105      105.1  Right-vessel                      50.1458    12.1126    21.0000    78.0000    57.0000
 32  63      2843     2842.7  Right-choroid-plexus              46.3759    13.8319     6.0000   115.0000   109.0000
 33  72        68       67.9  5th-Ventricle                     42.4444    11.2861    26.0000    83.0000    57.0000
 34  77     25325    25325.0  WM-hypointensities                71.8650    16.2379     5.0000   112.0000   107.0000
 35  78         0        0.0  Left-WM-hypointensities            0.0000     0.0000     0.0000     0.0000     0.0000
 36  79         0        0.0  Right-WM-hypointensities           0.0000     0.0000     0.0000     0.0000     0.0000
 37  80       153      153.1  non-WM-hypointensities            50.4551    16.1478    18.0000    88.0000    70.0000
 38  81         0        0.0  Left-non-WM-hypointensities        0.0000     0.0000     0.0000     0.0000     0.0000
 39  82         0        0.0  Right-non-WM-hypointensities       0.0000     0.0000     0.0000     0.0000     0.0000
 40  85       350      349.6  Optic-Chiasm                      66.0833    15.7641    24.0000   102.0000    78.0000
 41 251       806      805.6  CC_Posterior                     119.2646    18.1322    57.0000   150.0000    93.0000
 42 252       252      251.7  CC_Mid_Posterior                 109.1685    16.3862    51.0000   150.0000    99.0000
 43 253       295      295.4  CC_Central                       113.3418    16.2739    77.0000   140.0000    63.0000
 44 254       294      293.7  CC_Mid_Anterior                  115.1645    17.9396    72.0000   149.0000    77.0000
 45 255       657      657.4  CC_Anterior                      124.1047    22.5045    54.0000   166.0000   112.0000

1 个答案:

答案 0 :(得分:0)

你试过Talend Open Studio / Data Integration吗? TOS能够自动执行这种复杂的转换。数据转换作业的最终可执行文件是一个jar文件,您可以从shell脚本轻松调用它。虽然开始使用TOS需要一段时间,但它非常强大。该产品已获得GPL v2许可,并且具有相当活跃的community

当然你可以编写一些awk / sed / perl狂野的代码,你会得到一个结果,但在你的情况下,这可能会变得非常难以理解,并且无法通过如此复杂的转换来维护。

HTH,Michael