从文档中提取文本块并将其写入新的文本文件

时间:2019-05-15 14:04:39

标签: python regex

我有一个大文件的文本文件,我想读取其中的几行,并将这些行作为一行写到文本文件中。例如,我想从某个开始的单词开始逐行阅读,并以一个单独的括号结束。因此,如果我的起始词是“ CAR”,那么我想开始阅读,直到读取一个带换行符的括号。起始词和结束词也要保留。

实现此目标的最佳方法是什么?我已经尝试过模式匹配并避免使用正则表达式,但是我认为这是不可能的。

代码:

array = []
f = open('text.txt','r') as infile
w = open(r'temp2.txt', 'w') as outfile
for line in f:
    data = f.read()
    x = re.findall(r'CAR(.*?)\)(?:\\n|$)',data,re.DOTALL)
    array.append(x)
    outfile.write(x)
return array

文字可能是什么样

( CAR: *random info*
    *random info* - could be many lines of this
)

2 个答案:

答案 0 :(得分:1)

使用正则表达式完全可以解决这类问题。当模式包含递归时,就不能使用它们,例如从括号中获取内容:((text1)(text2))。

您可以使用以下正则表达式:>>> User.query.all() [<User foo@domain.com>, <User bar@domain.com>] >>> # DB restarted ... >>> User.query.all() Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 206, in execute res = self._query(query) File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 312, in _query db.query(q) File "/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py", line 224, in query _mysql.connection.query(self, query) MySQLdb._exceptions.OperationalError: (2006, 'MySQL server has gone away') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3161, in all return list(self) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3317, in __iter__ return self._execute_and_instances(context) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3342, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute return meth(self, multiparams, params) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement distilled_params, File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context e, statement, parameters, cursor, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception util.raise_from_cause(sqlalchemy_exception, exc_info) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 128, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 206, in execute res = self._query(query) File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 312, in _query db.query(q) File "/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py", line 224, in query _mysql.connection.query(self, query) sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (2006, 'MySQL server has gone away') [SQL: SELECT user.id AS user_id, user.email AS user_email, user.token_id AS user_token_id FROM user] (Background on this error at: http://sqlalche.me/e/e3q8) >>> User.query.all() Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1177, in _execute_context conn = self._revalidate_connection() File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 463, in _revalidate_connection "Can't reconnect until invalid " sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3161, in all return list(self) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3317, in __iter__ return self._execute_and_instances(context) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3342, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute return meth(self, multiparams, params) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement distilled_params, File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context e, util.text_type(statement), parameters, None, None File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception util.raise_from_cause(sqlalchemy_exception, exc_info) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 128, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1177, in _execute_context conn = self._revalidate_connection() File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 463, in _revalidate_connection "Can't reconnect until invalid " sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back [SQL: SELECT user.id AS user_id, user.email AS user_email, user.token_id AS user_token_id FROM user] [parameters: [{}]] >>> db.rollback() >>> User.query.all() [<User foo@domain.com>, <User bar@domain.com>]

See explanation...

enter image description here

Here you can visualize your regular expression...

答案 1 :(得分:1)

我们可以使用正则表达式模式((CAR.*)\)和标志gms)来匹配您感兴趣的文本。

然后,我们只需要从结果匹配中删除换行符并将它们写入文件即可。

with open("text.txt", 'r') as f:
    matches = re.findall(r"(CAR.*)\)", f.read(), re.DOTALL)

with open("output.txt", 'w') as f:
    for match in matches:
        f.write(" ".join(match.split('\n')))
        f.write('\n')

输出文件如下:

CAR: *random info* *random info* - could be many lines of this

编辑: 更新了代码以在输出文件中的匹配项之间放置换行符