INPUT1:
create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
create external table db.emp1(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
输入2:
create table db.emp(id int,name string)
location 'hadfs:.../';;
create table db.emp1(id int,name string)
location 'hadfs:.../';
requeried output:
create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
create table db.emp(id int,name string)
location 'hadfs:.../';
这两个文件存储在file1.hql
下create external table db.emp1(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
create table db.emp1(id int,name string)
location 'hadfs:.../';
这两个文件存储在file2.hql下,依此类推
filenames = ['in1.txt', 'in2.txt']
with open('result.txt', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
content = infile.read().replace('\n', '')
outfile.write(content)
但我没有得到正确的输出。请给我一个如何实现这一目标的提示。 使用此代码时输出错误的mannaer
我尝试使用代码
import re
import sys
f = open ('text.txt','r')
fout = open ('hql.txt','w')
text = f.read()
fout = "hql.txt"
fout = open('hql.txt','w')
for item in re.findall(r'CREATE[^;]*;',text):
print >>fout, re.search(r'(?<=\.)\w+',item).group()+'.hql'
print >>fout,(item)
f.close()
fout.close()
此代码的o / p为:
emp.hql
create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
emp1.hql
create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
它会为input1和input2文件生成这种输出。 现在我需要连接
create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
create table db.emp(id int,name string)
location 'hadfs:.../';
并以emp.hql等形式存储在文件中。
答案 0 :(得分:2)
这个缩进一切正常:
filenames = ['in1.txt', 'in2.txt']
with open('result.txt', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
content = infile.read().replace('\n', '')
outfile.write(content)
并注意录制文件的顺序。您可以使用元组filenames = ('in1.txt', 'in2.txt')
首先记录in1.txt,然后记录in2.txt。
答案 1 :(得分:0)
此处的目的是从input1
读取4行,从input2
读取2行,删除所有空行,并将这些行按顺序写入输出文件。
为此,可以使用grouper()
函数,它将以合适的块读取每个文件。 zip()
用于同时将两组数据放在一起。
import itertools
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
with open('in1.txt') as f_1, open('in2.txt') as f_2:
for file_number, rows in enumerate(zip(grouper(f_1, 4, ''), grouper(f_2, 2, '')), start=1):
# Remove carriage returns and empty lines from all rows
rows = [[row.strip() for row in r if len(row.strip())] for r in rows]
with open('file{}.hql'.format(file_number), 'w') as f_output:
f_output.write('{}\n{}\n'.format('\n'.join(rows[0]), '\n'.join(rows[1])))
给你file1.hql
看起来如下:
create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
create table db.emp(id int,name string)
location 'hadfs:.../';;
我建议您添加print
语句,以便更好地了解每行的内容。