我有这个数据集:
LP3I22- M5
01174c-qbFD.raw
L2P2 + p LPI Full ms [150.00-1500.00]
Scan #: 1
RT: 6.11
m/z Intensity Relative Resolution Charge Baseline
150.0119 67.3 0.00 152545.44 0.00 26.27
150.0153 59.3 0.00 269991.72 0.00 26.28
150.0156 66.1 0.00 288504.16 0.00 26.28
150.0161 67.2 0.00 172425.14 0.00 26.28
150.0330 78.9 0.00 167957.34 0.00 26.32
150.0485 75.0 0.00 208783.14 0.00 26.35
150.0603 166.2 0.00 220081.53 0.00 26.37
150.0624 75.8 0.00 189976.39 0.00 26.38
150.0866 70.1 0.00 233127.77 0.00 26.42
150.0991 54.8 0.00 193755.25 0.00 26.45
150.1136 62.9 0.00 184047.91 0.00 26.48
150.1348 85.4 0.00 206299.06 0.00 26.52
150.1410 68.7 0.00 225439.47 0.00 26.53
150.1428 73.1 0.00 205324.42 0.00 26.54
150.1498 61.2 0.00 199792.59 0.00 26.55
150.1572 56.8 0.00 160342.95 0.00 26.57
150.1583 71.4 0.00 187849.53 0.00 26.57
150.1746 84.7 0.00 211934.81 0.00 26.60
150.1777 81.2 0.00 251123.45 0.00 26.61
150.2106 65.7 0.00 198830.13 0.00 26.67
150.2144 53.7 0.00 190111.53 0.00 26.68
150.2781 74.0 0.00 187803.52 0.00 26.81
150.2807 90.7 0.00 174743.38 0.00 26.82
如何使用正则表达式提取数据结果?我对前7行并不感兴趣。
答案 0 :(得分:6)
假设它位于名为data
number_re = /\s*(\d+\.\d+)\s*/
data.scan(/^#{number_re.source * 6}$/)
这将导致以下数组
[["150.0119", "67.3", "0.00", "152545.44", "0.00", "26.27"],
["150.0153", "59.3", "0.00", "269991.72", "0.00", "26.28"],
["150.0156", "66.1", "0.00", "288504.16", "0.00", "26.28"],
["150.0161", "67.2", "0.00", "172425.14", "0.00", "26.28"],
["150.0330", "78.9", "0.00", "167957.34", "0.00", "26.32"],
["150.0485", "75.0", "0.00", "208783.14", "0.00", "26.35"],
["150.0603", "166.2", "0.00", "220081.53", "0.00", "26.37"],
["150.0624", "75.8", "0.00", "189976.39", "0.00", "26.38"],
["150.0866", "70.1", "0.00", "233127.77", "0.00", "26.42"],
["150.0991", "54.8", "0.00", "193755.25", "0.00", "26.45"],
["150.1136", "62.9", "0.00", "184047.91", "0.00", "26.48"],
["150.1348", "85.4", "0.00", "206299.06", "0.00", "26.52"],
["150.1410", "68.7", "0.00", "225439.47", "0.00", "26.53"],
["150.1428", "73.1", "0.00", "205324.42", "0.00", "26.54"],
["150.1498", "61.2", "0.00", "199792.59", "0.00", "26.55"],
["150.1572", "56.8", "0.00", "160342.95", "0.00", "26.57"],
["150.1583", "71.4", "0.00", "187849.53", "0.00", "26.57"],
["150.1746", "84.7", "0.00", "211934.81", "0.00", "26.60"],
["150.1777", "81.2", "0.00", "251123.45", "0.00", "26.61"],
["150.2106", "65.7", "0.00", "198830.13", "0.00", "26.67"],
["150.2144", "53.7", "0.00", "190111.53", "0.00", "26.68"],
["150.2781", "74.0", "0.00", "187803.52", "0.00", "26.81"],
["150.2807", "90.7", "0.00", "174743.38", "0.00", "26.82"]]
答案 1 :(得分:3)
lines = IO.readlines('inputfile.txt')
data = lines[7..-1].collect{|x| x.scan(/([^\d]+[\d.]+)/).flatten.map{|y| y.strip}}
对于不涉及正则表达式的更简单的解决方案,请将最后一行替换为:
data = lines[7..-1].collect{|x| x.split}
这一切都假设数据集与您列出的数据集匹配,并且不包含任何意外或格式不正确的值。
答案 2 :(得分:1)
使用模式:
^\s*(\d+\.\d+)\s*(\d+\.\d+)\s*(\d+\.\d+)\s*(\d+\.\d+)\s*(\d+\.\d+)\s*(\d+\.\d+)\s*$
在多线模式下
答案 3 :(得分:1)
7.times{DATA.readline} # discard first 7 lines
res = DATA.map{ |line| line.lstrip.squeeze.split(' ').map{|el| el.to_f } }
__END__
LP3I22- M5
01174c-qbFD.raw
L2P2 + p LPI Full ms [150.00-1500.00]
Scan #: 1
RT: 6.11
m/z Intensity Relative Resolution Charge Baseline
150.0119 67.3 0.00 152545.44 0.00 26.27
150.0153 59.3 0.00 269991.72 0.00 26.28
150.0156 66.1 0.00 288504.16 0.00 26.28
150.0161 67.2 0.00 172425.14 0.00 26.28
150.0330 78.9 0.00 167957.34 0.00 26.32
150.0485 75.0 0.00 208783.14 0.00 26.35
150.0603 166.2 0.00 220081.53 0.00 26.37
res中的值现在是浮点数:
[[150.019, 67.3, 0.0, 152545.4, 0.0, 26.27], [150.0153, 59.3, 0.0, 2691.72, 0.0, 26.28],
[150.0156, 6.1, 0.0, 28504.16, 0.0, 26.28], [150.0161, 67.2, 0.0, 172425.14, 0.0, 26.28],
[150.03, 78.9, 0.0, 167957.34, 0.0, 26.32], [150.0485, 75.0, 0.0, 208783.14, 0.0, 26.35],
[150.0603, 16.2, 0.0, 2081.53, 0.0, 26.37]