如何在数据框中选择一行并使其成为列名?熊猫

时间:2017-05-06 13:58:03

标签: pandas dataframe

enter image description here

我想制作第3行列索引

2 个答案:

答案 0 :(得分:2)

快速简单的答案是

df.T.set_index(3).T

答案 1 :(得分:1)

我认为您需要从df开始locdrop这一行选择行:

df = pd.DataFrame({'A':['Groups'], 'B':['Quantity'], 'C':['Net Sales']}, index=[3])

df.columns = df.loc[3]
df = df.drop(3)

print (df)
Empty DataFrame
Columns: [Groups, Quantity, Net Sales]
Index: []

但更好的是避免它,例如如果使用read_csv获取skiprows,请使用参数DataFrame,主要优势是read_csv获取所有列的正确dtypes:

import pandas as pd
from pandas.compat import StringIO

temp=u"""A,B,C
D,E,F
G,H,I
J,K,L
Groups Quantity,Net,Sales
4,6,4"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp))
print (df)
                 A    B      C
0                D    E      F
1                G    H      I
2                J    K      L
3  Groups Quantity  Net  Sales
4                4    6      4

df = pd.read_csv(StringIO(temp), skiprows=4)
print (df)
   Groups Quantity  Net  Sales
0                4    6      4

<强>计时

In [319]: %timeit (df.T.set_index(3).T.reset_index(drop=True).astype(float).rename_axis(None, 1))
10 loops, best of 3: 43.1 ms per loop

In [320]: %timeit (jez(df))
10 loops, best of 3: 23.7 ms per loop

In [321]: %timeit (jez1(df))
100 loops, best of 3: 13.6 ms per loop

时间安排的代码

此外,还添加了转换为float到所有解决方案,如果所有数据都是字符串,那么就没有必要。

np.random.seed(100)
df = pd.DataFrame(np.random.random((100000,3)), columns=list('ABC'))
df = df.drop([0,1,2])
df.loc[3] = ['Groups', 'Quantity', 'Net Sales']
print (df)

print (df.T.set_index(3).T.reset_index(drop=True).astype(float).rename_axis(None, 1))

def jez(df):
    df.columns = df.loc[3]
    return df.drop(3).reset_index(drop=True).astype(float).rename_axis(None, 1)

def jez1(df):
    arr = df.values
    #get position (number of row) with 3
    idx = df.index.get_loc(3)
    return pd.DataFrame(np.delete(arr, (idx), axis=0).astype(float), columns=arr[idx])