使用pandas和多个索引在python中读取excel文件

时间:2016-02-01 16:42:41

标签: python excel pandas timestamp

我是一个蟒蛇新手所以请原谅这个基本问题。 我的.xlsx文件看起来像这样

Unnamend:1    A     Unnamend:2    B
2015-01-01    10    2015-01-01    10
2015-01-02    20    2015-01-01    20
2015-01-03    30    NaT           NaN

当我使用pandas.read_excel(...)在Python中读取它时,pandas会自动使用第一列作为时间索引。

是否有一个单行通知熊猫注意到,每隔一列是一个时间索引属于它旁边的时间序列?

所需的输出如下所示:

date          A     B
2015-01-01    10    10
2015-01-02    20    20
2015-01-03    30    NaN

2 个答案:

答案 0 :(得分:1)

为了解析相邻visit <insert path here>的块并在各自的columns索引上对齐,您可以执行以下操作:

datetime开始:

df

您可以在Int64Index: 3 entries, 0 to 2 Data columns (total 4 columns): Unnamed: 0 3 non-null datetime64[ns] A 3 non-null int64 Unnamed: 1 2 non-null datetime64[ns] B 2 non-null float64 dtypes: datetime64[ns](2), float64(1), int64(1) 上迭代2列和merge的块,如下所示:

index

得到:

def chunks(l, n):
    """ Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

merged = df.loc[:, list(df)[:2]].set_index(list(df)[0])
for cols in chunks(list(df)[2:], 2):
    merged = merged.merge(df.loc[:, cols].set_index(cols[0]).dropna(), left_index=True, right_index=True, how='outer')
遗憾的是,

A B 2015-01-01 10 10 2015-01-01 10 20 2015-01-02 20 NaN 2015-01-03 30 NaN 无效,因为它无法处理重复的pd.concat条目,否则可以使用index

list comprehension

答案 1 :(得分:0)

在使用pandas显示

之后,我使用xlrd导入数据
import xlrd
import pandas as pd
workbook = xlrd.open_workbook(xls_name)
workbook = xlrd.open_workbook(xls_name, encoding_override="cp1252")
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
data =[]
for row in range(10, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
          elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)

first_column=second_column=third_column=[]
for elm in data :
    first_column.append(elm(first_row[0]))
    second_column.append(elm(first_row[1]))
    third_column.append(elm(first_row[2]))

dict1={}
dict1[first_row[0]]=first_column
dict1[first_row[1]]=second_column
dict1[first_row[2]]=third_column
res=pd.DataFrame(dict1, columns=['column1', 'column2', 'column3'])
print res