我将我的数据放入NASA的ICARTT格式进行存档。这是一个逗号分隔的文件,有多个标题行,标题行中有逗号。类似的东西:
46, 1001
lastname, firstname
location
instrument
field mission
1, 1
2011, 06, 21, 2012, 02, 29
0
Start_UTC, seconds, number_of_seconds_from_0000_UTC
14
1, 1
-999, -999
measurement name, units
measurement name, units
column1 label, column2 label, column3 label, column4 label, etc.
我必须每天为收集数据制作一个单独的文件,因此我最终会创建大约30个文件。当我通过pandas.DataFrame.to_csv创建一个csv文件时,我不能(据我所知)在写入数据之前简单地将标题行写入文件,所以我不得不欺骗它来做我想做的事情
# assuming <df> is a pandas dataframe
df.to_csv('dst.ict',na_rep='-999',header=True,index=True,index_label=header_lines)
其中&#34; header_lines&#34;是标题字符串
这给了我正是我想要的,除了&#34; header_lines&#34;用双引号括起来。有没有办法使用to_csv将文本写入csv文件的头部或删除双引号?我已经尝试过设置quotechar =&#39;&#39; to_csv()中的doublequote = False,但双引号仍然出现。
我现在在做什么(它现在有用,但我想转向更好的东西)只是打开一个文件(&#39; dst.ict&#39;,&#39; w&# 39;)并逐行打印,这很慢。
答案 0 :(得分:5)
实际上,您可以在数据之前编写标题行。 pandas.DataFrame.to_csv
将path_or_buf
作为其第一个参数,而不仅仅是路径名:
pandas.DataFrame.to_csv(path_or_buf, *args, **kwargs)
path_or_buf :字符串或文件句柄,默认无
文件路径或对象,如果提供“无”,则结果将以字符串形式返回。
以下是一个例子:
#!/usr/bin/python2
import pandas as pd
import numpy as np
import sys
# Make an example data frame.
df = pd.DataFrame(np.random.randint(100, size=(5,5)),
columns=['a', 'b', 'c', 'd', 'e'])
header = '\n'.join(
# I like to make sure the header lines are at least utf8-encoded.
[unicode(line, 'utf8') for line in
[ '1001',
'Daedalus, Stephen',
'Dublin, Ireland',
'Keys',
'MINOS',
'1,1',
'1904,06,16,1922,02,02',
'time_since_8am', # Ends up being the header name for the index.
]
]
)
with open(sys.argv[1], 'w') as ict:
# Write the header lines, including the index variable for
# the last one if you're letting Pandas produce that for you.
# (see above).
for line in header:
ict.write(line)
# Just write the data frame to the file object instead of
# to a filename. Pandas will do the right thing and realize
# it's already been opened.
df.to_csv(ict)
结果就是您想要的 - 编写标题行,然后调用.to_csv()
并写下:
$ python example.py test && cat test
1001
Daedalus, Stephen
Dublin, Ireland
Keys to the tower
MINOS
1, 1
1904, 06, 16, 1922, 02, 02
time_since_8am,a,b,c,d,e
0,67,85,66,18,32
1,47,4,41,82,84
2,24,50,39,53,13
3,49,24,17,12,61
4,91,5,69,2,18
很抱歉,如果现在为时已晚,无法使用。我在归档这些文件(并使用Python),所以如果你有未来的问题,请随时给我留言。
答案 1 :(得分:0)
尽管还有几年时间并且 ndt 的回答非常好,另一种可能性是先编写标题,然后使用 to_csv() 和 mode='a'(追加):
import vlc
import time
## pinched from vlc for keyboard input
import termios
import tty
import sys
def getch(): # getchar(), getc(stdin) #PYCHOK flake
fd = sys.stdin.fileno()
old = termios.tcgetattr(fd)
try:
tty.setraw(fd)
ch = sys.stdin.read(1)
finally:
termios.tcsetattr(fd, termios.TCSADRAIN, old)
return ch
## end pinched code
vlc_instance = vlc.Instance()
player = vlc_instance.media_list_player_new()
mymedia = ["vp.mp3","vp1.mp3","happy.mp3"]
Media = vlc_instance.media_list_new(mymedia)
player.set_media_list(Media)
for index, name in enumerate(mymedia):
print("Playing:",name)
player.play_item_at_index(index)
time.sleep(1)
while player.get_state() != 6:
time.sleep(1)
k = getch()
if k == "y":
player.stop()
break
不过,由于两次写入操作,它可能不太有效...