Question

我想读取一个特定的txt文件并从中获取数据并将其写入元组。问题是我不需要文件中的所有数据，只需要特定的数据。所以文本文件如下所示：

HHSDMSDN1-pool                           1.02T   141G     39     22  2.62M   940K
  **c5t600507680C800000001CBd0   834G   118G**     32     16  2.19M   734K
  **c5t600507680C00352d0   216G  22.3G**      7      5   434K   206K


HHSDMSDN2-pool                           1.09T   308G     12      6   744K  83.8K
  **c5t600507680C800001CDd0   790G   162G**     10      1   617K  12.5K
  **c5t600507680C8000000037Dd0   203G  34.8G**      1      0   123K  10.2K
  **c5t600507680C800000387d0   126G   112G**      0      5  5.36K  80.5K

HHSDMSDN3-pool                           1.13T  33.4G     24     19  1.39M   623K
  **c5t600507680C80002E6000001CFd0   921G  30.8G**     18     11  1.10M   465K
  **c5t600507680C80002E600000203d0   235G  2.63G**      5      8   293K   158K

大胆的文字需要进入元组。如果第一个值是字符串，那么最好，接下来是两个double / float。

所以输出将是

((c5t600507680C800000001CBd0, 834, 118), (c5t600507680C00352d0, 216, 22.3), .....))

有什么想法吗？

Answer 1

您只需逐行遍历文件并跟踪已经看到的内容。

根据要求修改新解决方案

import pprint

data = """HHSDMSDN1-pool                           1.02T   141G     39     22  2.62M   940K
  c5t600507680C800000001CBd0   834G   118G     32     16  2.19M   734K
  c5t600507680C00352d0   216G  22.3G      7      5   434K   206K


HHSDMSDN2-pool                           1.09T   308G     12      6   744K  83.8K
  c5t600507680C800001CDd0   790G   162G     10      1   617K  12.5K
  c5t600507680C8000000037Dd0   203G  34.8G      1      0   123K  10.2K
  c5t600507680C800000387d0   126G   112G      0      5  5.36K  80.5K

HHSDMSDN3-pool                           1.13T  33.4G     24     19  1.39M   623K
  c5t600507680C80002E6000001CFd0   921G  30.8G     18     11  1.10M   465K
  c5t600507680C80002E600000203d0   235G  2.63G      5      8   293K   158K"""

# collect all records by key
d = {}

# current key "HHSDM..."
k = None

# current records
r = []

for line in data.splitlines():
    if line.startswith("  c"):
        # this is a record, append it to the current collection of records
        fields = line.split()
        r.append((fields[0], fields[1], fields[2]))
    elif line.startswith("H"):
        # this is a key, rember it, we will need it later
        k = line.split("-")[0]
    elif k:
        # this is an empty line and we have a key, store the records
        # and reset current records and current key
        d[k] = r
        r = []
        k = None

# append current records at the end of the input
d[k] = r

pprint.pprint(d)

输出：

{'HHSDMSDN1': [('c5t600507680C800000001CBd0', '834G', '118G'),
               ('c5t600507680C00352d0', '216G', '22.3G')],
 'HHSDMSDN2': [('c5t600507680C800001CDd0', '790G', '162G'),
               ('c5t600507680C8000000037Dd0', '203G', '34.8G'),
               ('c5t600507680C800000387d0', '126G', '112G')],
 'HHSDMSDN3': [('c5t600507680C80002E6000001CFd0', '921G', '30.8G'),
               ('c5t600507680C80002E600000203d0', '235G', '2.63G')]}

Python - 读取文件并将数据保存到元组

1 个答案: