我想使用Pandas在Python(3.6.0)中读取.txt文件。 .txt文件的第一行如下所示:
Location: XXX
Campaign Name: XXX
Date of log start: 2016_10_09
Time of log start: 04:27:28
Sampling Frequency: 1Hz
Config file: XXX
Logger Serial: XXX
CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000
我使用下面的简单代码:
import pandas
df = pandas.read_csv("TextFile.txt", sep=";", header=[10])
print(df)
然后在终端中获得以下输出:
Time msec Channel1 Channel2 Channel3 Channel4
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ...
599 NaN NaN NaN NaN NaN NaN
我的直接想法是,熊猫没有"喜欢"前两列。您是否有任何建议我可以让Pandas读取.txt文件而不更改文件本身的任何内容。
提前谢谢。
答案 0 :(得分:3)
您希望将skiprows=11
和skipinitial_space=True
与read_csv
一起传递给sep=';'
,因为您的分隔符与空格一致:
In [83]:
import io
import pandas as pd
t="""Location: XXX
Campaign Name: XXX
Date of log start: 2016_10_09
Time of log start: 04:27:28
Sampling Frequency: 1Hz
Config file: XXX
Logger Serial: XXX
CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000"""
df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True)
df
Out[83]:
Time msec Channel1 Channel2 Channel3 Channel4
0 04:30:00 0 0.01526 10.67903 10.58366 0.0
1 04:30:01 0 0.17090 10.68666 10.58518 0.0
2 04:30:02 0 0.25177 10.68284 10.58442 0.0
您可以看到dtypes现在正确无误:
In [84]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time 3 non-null object
msec 3 non-null int64
Channel1 3 non-null float64
Channel2 3 non-null float64
Channel3 3 non-null float64
Channel4 3 non-null float64
dtypes: float64(4), int64(1), object(1)
memory usage: 224.0+ bytes
您可能还希望选择将时间解析为日期时间:
In [86]:
df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True, parse_dates=['Time'])
df
Out[86]:
Time msec Channel1 Channel2 Channel3 Channel4
0 2017-03-16 04:30:00 0 0.01526 10.67903 10.58366 0.0
1 2017-03-16 04:30:01 0 0.17090 10.68666 10.58518 0.0
2 2017-03-16 04:30:02 0 0.25177 10.68284 10.58442 0.0
In [87]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time 3 non-null datetime64[ns]
msec 3 non-null int64
Channel1 3 non-null float64
Channel2 3 non-null float64
Channel3 3 non-null float64
Channel4 3 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(1)
memory usage: 224.0 bytes