从.txt文件的列中提取特定数据(在方括号内)?

时间:2019-05-09 22:53:42

标签: python

我有以下内容。我为每行添加了#of行,该行未包含在文本中,因此不能考虑。

(line1)The following table of hex bolt head dimensions was adapted from ASME B18.2.1, Table 2, "Dimensions of Hex Bolts."

(line2)
(line3)Size Nominal (Major)
(line4)Diameter [in]            Width Across Flats          Head Height
(line5)        Nominal [in] Minimum [in]    Nominal [in]        Minimum [in]
(line6)1/4" 0.2500          7/16"(0.438)    0.425       11/64"  0.150

我试图从某些列中提取数据,但是从第2列中提取数据时遇到问题,其中第2列中的括号内包含浮点数

从一个包含内容的列和行信息的txt文件中,我尝试将其组织在列表中。其中一列的"7/16"(0.438)这样的括号内有一个浮点数,它位于第2列中,我需要在列表中存储0.438。

我也想跳过前5行,因为它们是字符串,我只想从第6行开始阅读

def Main():

    filename = 'BoltSizes.txt' #file name
    f1 = open(filename, 'r')  # open the file for reading
    data = f1.readlines()  # read the entire file as a list of strings
    f1.close()  # close    the file  ... very important

    #creating empty arrays
    Diameter = []
    Width_Max = []
    Width_Min = []
    Head_Height = []

    for line in data: #loop over all the lines
        cells = line.strip().split(",") #creates a list of words

        val = float(cells[1])
        Diameter.append(val)

        #Here I need to get only the float from the brackets 7/16"(0.438)
        val = float(cells[2])
        Width_Max.append(val)

        val = float(cells[3])
        Width_Min.append(val)

        val = float(cells[5])
        Head_Height.append(val)

Main()

我收到此错误:

line 16, in Main
    val = float(cells[1]) ValueError: could not convert string to float: ' Table 2'

1 个答案:

答案 0 :(得分:0)

由于data是经典的Python列表,因此您可以使用列表索引来获取解析范围。 因此,要跳过前5列,应将data[5:]传递到for循环。

修复第二列比较复杂。从第2列提取数据的最佳方法是使用re.search()

因此,您可以将代码更改为以下内容:

# we'll use regexp to extract value for col no. 2
import re
# skips first five rows
for line in data[5:]:
   # strips the excesive whitespace and replaces them with single comma
   strip = re.sub("\s+", ",", line)
   cells = strip.split(",") # creates a list of words

   # parsing column 0, 1..
   ...
   # column 2 is critical
   tmp = re.search(r'\((.*?)\)', cells[2])
   # we have to check if re.search() returned something
   if tmp:
      # we're taking group 1, group 0 includes brackets.
      val = tmp.group(1)
      # one more check, val should be numeric value for float to work.
      if val.isnumeric():
         Width_Max.append(float(val))

   # continue your parsing

此代码的问题是,它可能会在第一次更改数据时中断,但是由于您只放置了一行,所以我无法提供更详细的帮助。