第二行的数据应该移到第一行的数据旁边,每10行要做一次,就像数据集是20x10的矩阵一样,应该变成2x100。
输入:
1-A B C D E F G
2-H I J K L M N
。
。
。
10-O P Q R S T U
输出:
1-A B C D E F G H I J K L M N。 。 。 。 。 。 。 。 O P Q R S T U
答案 0 :(得分:0)
我意识到您使用Python标记了您的问题,但这是通过命令行进行的:
xargs -n10 -d'\n' < yourlistfile.txt
yourlistfile.txt
是要解析的文件的名称。
所写的命令将输出到屏幕上。您可以通过将输出添加到该命令的末尾来重定向该输出到新文件:> your_results.txt
,例如:
xargs -n10 -d'\n' < yourlistfile.txt > reorganizedlistfile.txt
在这篇文章中查看其他一些想法:How to merge every two lines into one from the command line?
注意:显然xargs
的某些版本不喜欢该-d
选项,例如我在MacOS上收到错误消息,说不支持该选项。但是对于一个简单的示例,其中令牌由硬返回分隔开,无论如何都不需要delimiter参数。
答案 1 :(得分:0)
假设您想从文件中读取文件,并且您可能对Python来说还比较陌生,那么这里有一些示例代码供您查看。我已尝试添加足够的注释和安全检查,以使您了解其工作方式以及扩展方式。
请注意,在此python版本中,您仍然需要对结果做一些事情,如果可以使用@Marcs,我会强烈考虑@Marcs的简洁答案。
这里要考虑的许多假设。您如何确定每一行有相同数量的东西?我添加了一些逻辑来检查这一点,发现适度偏执会有所帮助。
假设您要从文件中读取,下面是一个示例程序供您考虑:
line_cnt=1 #things=3, line="a b c"
line_cnt=2 #things=3, line="d e f"
line_cnt=3 #things=3, line="g h i"
gathered 3 into=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
line_cnt=4 #things=3, line="j k l"
line_cnt=5 #things=3, line="m n o"
line_cnt=6 #things=3, line="p q r"
gathered 3 into=['j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r']
line_cnt=7 #things=3, line="s t v"
line_cnt=8 #things=3, line="u w x"
line_cnt=9 #things=3, line="y z 0"
gathered 3 into=['s', 't', 'v', 'u', 'w', 'x', 'y', 'z', '0']
line_cnt=10 #things=3, line="1 2 3"
line_cnt=11 #things=3, line="4 5 6"
line_cnt=12 #things=3, line="7 8 9"
gathered 3 into=['1', '2', '3', '4', '5', '6', '7', '8', '9']
now have 4 rows
rows=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
['j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r']
['s', 't', 'v', 'u', 'w', 'x', 'y', 'z', '0']
['1', '2', '3', '4', '5', '6', '7', '8', '9']
Process finished with exit code 0
import io
def join_dataset(f, nrows=10):
temp_row = [ ]
consolidated_rows = [ ]
expected_row_size = None
line_cnt = 0
for line in f:
line_cnt += 1 # want one-based line numbering so not using enumerate
line = line.strip() # remove trailing newline
things = line.split() # create list based on whitespace
row_size = len(things) # check how long this row's list is
if expected_row_size is None:
expected_row_size = row_size # assume all same size as 1st row
elif row_size != expected_row_size:
raise ValueError('Expected {} things but found {} on line# {}'.format(expected_row_size,row_size,line_cnt))
print('line_cnt={} #things={}, line="{}"'.format(line_cnt, len(things), line))
# read about append vs extend here https://stackoverflow.com/q/252703/5590742
temp_row.extend(things)
# check with %, the mod operator, 1%3 = 1, 2%3 = 2, 3%3 = 0 (even division), 4%3 = 1, 5%3 = 2, etc.
# We're counting lines from 1, so if we get zero we have that many lines
if 0 == (line_cnt % nrows):
print('gathered {} into={}'.format(nrows,temp_row))
# or write gathered to another file
consolidated_rows.append(temp_row)
temp_row = [ ] # start a new list
if temp_row:
# at end of file, but make sure we include partial results
# (if you expect perfect alignment this would
# be another good place for an error check.)
consolidated_rows.append(temp_row)
return consolidated_rows
test_lines="""a b c
d e f
g h i
j k l
m n o
p q r
s t v
u w x
y z 0
1 2 3
4 5 6
7 8 9"""
# if your data is in a file use:
# with open('myfile.txt', 'r') as f:
with io.StringIO(test_lines) as f:
rows = join_dataset(f, nrows=3)
# rows = join_dataset(f) # this will default to nrows=10
print('now have {} rows'.format(len(rows)))
print('rows={}'.format('\n'.join([str(row) for row in rows])))