Question

我有100个.txt / .sed文件，每个文件都有很多行。

示例输入文件：

Time: 10:34:51.49,15:21:39.24
Box Temperature (K): 32.82,8.88,-10.07
Silicon Temperature (K): 10.90,9.88
Voltage: 7.52,7.41
Dark Mode: AUTO,AUTO
Radiometric Calibration: RADIANCE
Units: W/m^2/sr/nm
GPS Time: n/a
Satellites: n/a
Channels: 1024

期望的输出：

Time             15:21:39.24
Box Temp         32.82
                  8.88
                -10.07
Si Temp          10.90
                  9.88

我试图编写用于识别字符串的代码，然后创建值列表，然后解决将它们安排到DataFrame中，然后将它们写入.csv文件。示例代码

testtxt = 'Temperature (K): 32.82,8.88,-10.07,32.66,8.94,-10.07'
exp = r'^Temperature (K):(\s*) ([0-9.]+)([0-9.]+), ([0-9.-]+) , (-[0-9-.]+),([0-9-.]+) , ([0-9-.]+),(-[0-9-.]+)'
regexp = re.compile(exp)
my_temp = regexp.search(txt)
print(my_temp.group(0))

ERROR：

AttributeError: 'NoneType' object has no attribute 'group'

基本上，它找不到匹配！

澄清：我想要一种有效的方法来仅提取时间和温度值，而不是其他值。能够在找到文件后停止扫描文件会很棒，因为每个文件有超过500行并且我有很多文件。

Answer 1

我的建议是使用string.startswith（）来确定字符串是否以“Box Temperature（K）”开头，或者其他什么。找到后，获取字符串的其余部分，将其解析为CSV，然后验证每个组件。尝试使用正则表达式完成所有操作比使用它更值钱。

如果你希望在找到所有内容后停止代码，只需为要查找的内容设置标志，并且一旦设置了所有标志，就可以退出。类似的东西：

foundTime = 0
foundBoxTemp = 0
foundSiTemp = 0
while (not end of file AND (foundTime == 0 || foundBoxTemp == 0 || foundSiTemp == 0))
    if (line.startswith("Box Temperature (K):"))
        // parse and output
    else if (line.startswith("Time:"))
        // parse and output
    else ....

使用常规表达式在精确字符串后提取多个值

1 个答案: