因此,我有一个SQL文件,其中包含以下内容:
createtab_stmt
CREATE EXTERNAL TABLE `table1`(
" `name_id` bigint, "
" `address_id` string, "
" `full_name` bigint, "
`insert_timestamp` timestamp)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
**我正在尝试删除倒数第二个句子(“ timestamp)”)中的所有文本。所以输出应该是行格式serde之前的所有内容:
createtab_stmt
CREATE EXTERNAL TABLE `table1`(
" `name_id` bigint, "
" `address_id` string, "
" `full_name` bigint, "
`insert_timestamp` timestamp)
这是我现有的代码:
import re
f = open("/home/dir2/ddl", 'rt', encoding='latin-1')
words=f.readlines()
with open("/home/dir1/sampl7.sql","w") as output:
for i in words:
output.write(i.replace('"', ''))
有什么想法或建议吗?我不确定正则表达式是最好的选择还是有更好的方法。谢谢。
答案 0 :(得分:3)
我的方法将是这样的:
f = open("/home/dir2/ddl", 'rt', encoding='latin-1')
source=f.read()
with open("/home/dir1/sampl7.sql","w") as output:
output.write(source[:source.find(')')+1].replace('"', ''))
.find()
将找到字符')'
的索引,我们将使用它来获取字符0到该索引的字符串(+1包含')'本身)。