Question

我已经搜索了一段时间，但仍然无法弄明白。如果你能给我一些帮助，我感激不尽。

我有一个excel文件：

      ,   John,    James,    Joan,
      ,   Smith,   Smith,    Smith,
Index1,   234,     432,      324,
Index2,   2987,    234,      4354,

我想把它读成数据帧，这样 “约翰史密斯，詹姆斯史密斯，琼史密斯”是我的头球。我尝试了下面的，但我的标题仍然是“John，James，Joan”

xl = pd.ExcelFile(myfile, header=None)
row = df.apply(lambda x: str(x.iloc[0]) + str(x.iloc[1]))
df.append(row,ignore_index=True)

nrow = df.shape[0]
df = pd.concat([df.ix[nrow:], df.ix[2:nrow-1]])

Answer 1

可能手动更容易吗？：

>>> import itertools

>>> xl = pd.ExcelFile(myfile, header=None)
>>> sh = xl.book.sheet_by_index(0)
>>> rows = (sh.row_values(i) for i in xrange(sh.nrows))
>>> hd = zip(*itertools.islice(rows, 2))[1:]   # read first two rows
>>> df = pd.DataFrame(rows)                    # create DataFrame from remaining rows
>>> df = df.set_index(0)
>>> df.columns = [' '.join(x) for x in hd]     # rename columns
>>> df
        John Smith  James Smith  Joan Smith
0                                          
Index1         234          432         324
Index2        2987          234        4354

Answer 2

如果需要，您可以将两个级别分开。例如，如果您只想根据姓氏过滤列，这可能很有用。否则，其他解决方案肯定比这更好。

通常这对我有用：

In [103]: txt = '''John,James,Joan
     ...: Smith,Smith,Smith
     ...: 234,432,324
     ...: 2987,234,4354
     ...: '''

In [104]: x = pandas.read_csv(StringIO(txt), header=[0,1])
     ...: x.columns = pandas.MultiIndex.from_tuples(x.columns.tolist())
     ...: x
     ...:

但出于某种原因，这就错过了第一行：/

In [105]: x
Out[105]: 
    John  James   Joan
   Smith  Smith  Smith
0   2987    234   4354

我将使用pandas邮件列表查看是否存在错误。

Answer 3

我通过将Excel文件转换为csv文件以及以下内容来解决这个问题：

df = pd.read_csv(myfile, header=None)    
header = df.apply(lambda x: str(x.ix[0]) + ' ' + str(x.ix[1]))
df = df[2:]
df.columns = header

这是输出：

Out[252]: 
  John Smith  James Smith  Joan Smith
2        234          432         324
3       3453         2342         563

然而，当我通过pd.ExcelFile阅读（并解析我感兴趣的特定表格）时，存在与@Paul H类似的问题。似乎Excel格式默认将第一行视为列名，并返回我：

   Smith 234    Smith 432   Smith 324
3       3453         2342         563

在读取excel文件时，将每列中的前两个条目组合为标题

3 个答案: