Question

我正在处理来自Decagon Em50g数据记录器的传感器数据。在我对这些数据做任何事情之前，他们需要根据我们的研究期（开始和结束数据/时间）填补空白。来自记录器的原始传感器数据有3个标题，需要在填充间隙后保留（它们具有重要的元数据，这些元数据在第二个脚本中被拉出，在填充间隙后整理数据）。我需要为许多文件执行此操作，并且我首先尝试执行此操作。

我的方法：我以两种方式读取传感器数据，一次存储前两个标题＆＃39; headers_for_insert＆＃39; （用于以后插入）和第二个将第3个标题作为主标题（＆＃39; with_headers＆＃39;）用于加入/填充空格。在此之后，我会在我们的学习试用期间创建一个date_time系列，然后将其加入到＆＃39; with_headers＆＃39;差距填补的数据框架。

然后，我需要基本上将其他2个标头堆叠在填充空隙的数据帧上，以便在导出为csv时保留3个标头。

我已经尝试了很多东西，并希望获得有关如何提出解决方案的任何指导。

点击此处查看我想要完成的图片：

表和标题的图像

我需要最终表格的图像

# import necessary libraries
import pandas as pd
import glob

# read in data, one with headers and one without
without_headers = pd.read_excel('D1Y-05Feb2018-1331.xls',  header=None)
with_headers = pd.read_excel('D1Y-05Feb2018-1331.xls',  header=2)

# subset first two rows from 'without_headers' (these will be inserted as headers later)
headers_for_insert = without_headers.iloc[0:2,:]

# change date-time heading to 'date_time'
with_headers = with_headers.rename(columns = {'Measurement Time':'date_time'})
with_headers = with_headers.set_index('date_time')

# create variables for your start and end date/time
start='12/15/2017 00:00:00'
end='2/4/2018 12:00:00'

# create dataframe that has the date-time series for duration of study trial
date_range = pd.date_range(start, end, freq='H')
date_range_series = pd.Series(date_range)
date_range_df = pd.DataFrame(date_range_series)
date_range_df.columns = ['date_time']
date_range_df = date_range_df.set_index('date_time')

# Left hand join using created time-series as series to join on.
gap_filled = date_range_df.join(with_headers)
gap_filled = gap_filled.reset_index()

# stack 'headers_for_insert' on top of 'gap_filled' dataframe
'''Here I want to put the 2 headers that I stored in 'headers_for_insert' 
on top of the header in 'gap_filled' so that the output csv will have a
total of 3 headers'''

Answer 1

感谢那些提出这个问题的人。

更新

我找到了一个包含以下代码的解决方案（在excel中打开，一切都很好）。但是，当我尝试读回数据帧时，我收到错误。

# write 'headers_for_insert' to csv as a way to start building table.
headers_for_insert.to_csv('headers.csv', index=False, header=False)

# open 'headers.csv' and append the 'gap_filled' dataframe to the 2 headers
with open('headers.csv', 'a') as f:
    gap_filled.to_csv(f, index=False, header=True, encoding='utf-8')

# the above lines of code worked (here I'm reading in the csv to test)
solved = pd.read_csv('headers.csv')

但是，当我尝试将csv作为数据帧读取时，我收到此错误： UnicodeDecodeError：＆＃39; utf-8＆＃39;编解码器不能解码位置1中的字节0xb3：无效的起始字节

熊猫：传感器数据有3个标题。间隙填充后需要保留它们

1 个答案: