Question

我有n行文件，我想在python中加载，格式为

06:38:34 16.09.2017,  739648.4118,6077976.8575, 54.791616, 12.727939
06:38:35 16.09.2017,  739647.0628,6077975.6925, 54.791606, 12.727917

我希望这样：

06 38 34 16 09 2017 739648.4118 6077976.8575  54.791616  12.727939
06 38 35 16 09 2017 739647.0628 6077975.6925  54.791606  12.727917

因此它成为大小为（n，10）的数组。我尝试过

f=open('filename')
x.read()
f.close()

然后x是一个具有size（1）的字符串，所有数据都在一个元素中。我知道有一个名为split的命令，但我无法使其正常工作。有什么帮助吗？

Answer 1

这应该做您想要使用pandas

实现的目标

import pandas as pd

df = pd.read_csv('<your file>', header=None, names=['DateTime', 'Num1', 'Num2', 'Num3', 'Num4'])
df['DateTime'] = pd.to_datetime(df['DateTime'])

# Split datetime object in to seperate columns as desired output format
df['hour'] = df['DateTime'].dt.hour
df['minute'] = df['DateTime'].dt.minute
df['second'] = df['DateTime'].dt.second
df['day'] = df['DateTime'].dt.day
df['month'] = df['DateTime'].dt.month
df['year'] = df['DateTime'].dt.year

# Drop the DateTime columns
df.drop('DateTime', inplace=True, axis=1)

# Switch the order of columns to desired order
df = df[['hour', 'minute', 'second', 'day', 'month', 'year', 'Num1', 'Num2', 'Num3', 'Num4']]

#export to file with ' ' as seperator
df.to_csv('output file.txt', sep=' ', index=False, header=None)

Answer 2

怎么样：

with open('filename','r') as f:
    out = []
    a = f.read().replace(':',' ').replace(',','').split('\n')
    for i in a:
       out.append(i.split(' '))
    print(out[0:-1])

[0:-1]删除最后一个空元素

Answer 3

我一直喜欢使用管道方法来处理文件处理，这样，如果您的输入很大，就可以使用并发。无论如何，如果您使用的是ipython，则可以使用%timeit轻松检查性能，但这是我会做的：

processed = ""

def replace_char(line, char, replacement):
    return line.replace(char, replacement)

with open('SOME_PATH') as fh:
    processed += replace_char(replace_char(fh.read(), ":", " "), ",", "")

print(processed)

# OUTPUT
# 06 38 34 16.09.2017  739648.41186077976.8575 54.791616 12.727939
# 06 38 35 16.09.2017  739647.06286077975.6925 54.791606 12.727917

使用这种方法，如果您想更改处理文件的方式，只需更改replace_char或编写其他函数（如果您愿意）即可。如果需要并发，则可以使用multiprocessing或asyncio软件包。

使用python

3 个答案: