打开多个文本文件并读取直到特定字符

时间:2019-05-30 01:38:45

标签: python pandas loops text

我想读取几个文本文件,但是它们太长,因此我想在第一个'}'处停止读取它们。

编辑:下面的代码可以正常工作,但是在第一次遇到'}'时无法剪切文本文件。我想在示例的第二行(在'}'的第二行)处停止阅读

编辑2:我在代码中添加了readline语句。

示例:

{"offset":"14758816658","bids":[["968899.79379","0.01000000","0.01000000","0","1093577338","29194","5","14758816598","1"],["968899.35295","0.02100000","0.02100000","0","1093577193","29194","5","14758816186","1"],
...["9999999.00000","0.01000000","0.01000000","0","568775590","75620","5","12301971393","1"]]}
{"offset":"14758825743","bids":[["968019.05000","0.09815250","0.09815250","0","1093580802","243454","5","14758825261","1"],["968019.00000","0.18740000","0.18740000","0","1093580826","221763","5","14758825331","1"],

代码:

        with open(fileName, 'r') as fileName:
            line = fileName.readline()
                for x in line:
                    if x == '}':
                        break
            data = pd.read_csv(fileName, lineterminator= str(']'), low_memory= False, error_bad_lines=False, header= None)
            print(data)

编辑:我运行如下代码。为print正确输出,但在我pd.read.csv

时仍为整个数据集
with open(fileName, 'r', encoding='utf-8') as fileName:
      print(re.findall(r'(\{[^\{\}]*})', fileName.readline())[0])
      data = pd.read_csv(fileName, lineterminator= str(']'), low_memory= False, error_bad_lines=False, header= None)

编辑2:解决方案

 with open(fileName, 'r', encoding='utf-8') as fileName:
            d = io.StringIO(re.findall(r'(\{[^\{\}]*})', fileName.readline())[0])
            data = pd.read_csv(d, lineterminator= str(']'), low_memory= False, error_bad_lines=False, header= None)

1 个答案:

答案 0 :(得分:0)

使用正则表达式匹配并获取值。下面的代码段将首先为您提供文件中的{.*}

import re
with open('string.txt') as s:
    print(re.findall(r'(\{[^\{\}]*})', s.read())[0])