我不太清楚如何解释我的问题
但我需要通过插入几乎空的行来修改DataFrame
对于软件格式兼容性问题。
这是一个例子:
我需要更改此类Dataframe
:
df = pd.DataFrame({"line1": [200, 400, 800],
"line2": [400, 900, 700],
"line3": [800, 700, 966],
"name": ["bla", "bloo", "bloom"})
print df
line1 line2 line3 name
0 200 400 800 bla
1 400 900 700 bloo
2 800 700 966 bloom
对于这样的事情:
line_name line1 line2 line3
0 ID
1 name
2 bla 200 400 800
3 bloo 400 900 700
4 bloom 800 700 966
当然,真正的数据帧有更多的行和列。 所以我正在寻找一种方法,可以处理可变数量的列,而无需手动添加"空白"在线列下一个接一个。
我尝试了一些Groupby
方法以及制作了2个数据框(一个只有line
,ID
,name
结构,另一个只有{{1} }和names
然后将它们合并但没有成功。
任何想法都将不胜感激。
答案 0 :(得分:1)
不确定这正是您想要的。根据给出的示例数据框,您可以尝试:
df = pd.DataFrame({"line1": [200, 400, 800], "line2": [400, 900, 700], "line3": [800, 700, 966], "name": ["bla", "bloo", "bloom"]})
dftemp=pd.DataFrame(columns=df.columns)
dftemp.loc[0]=(len(df.columns)-1)*['']+['ID']
dftemp.loc[1]=(len(df.columns)-1)*['']+['name']
dfnew= dftemp.append(df,ignore_index=True)
dfnew.rename(columns={'name':'line_name'}, inplace=True)
cols = dfnew.columns.tolist()
cols = cols[-1:]+cols[:-1]
dfnew = dfnew[cols]
print(dfnew)
Output:
line_name line1 line2 line3
0 ID
1 name
2 bla 200 400 800
3 bloo 400 900 700
4 bloom 800 700 966
答案 1 :(得分:1)
您可以使用Setting With Enlargement尝试解决方案:
import pandas as pd
import numpy as np
df = pd.DataFrame({"line1": [200, 400, 800],
"line2": [400, 900, 700],
"line3": [800, 700, 966],
"name": ["bla", "bloo", "bloom"]})
print df
line1 line2 line3 name
0 200 400 800 bla
1 400 900 700 bloo
2 800 700 966 bloom
#create empty lists with last item name and ID by length of dataframe
#add to df two lines
df.loc[-1] = [np.nan for i in range(df.shape[1] - 1) ] + ['name']
df.loc[-2] = [np.nan for i in range(df.shape[1] - 1) ] + ['ID']
print df
line1 line2 line3 name
0 200 400 800 bla
1 400 900 700 bloo
2 800 700 966 bloom
-1 NaN NaN NaN name
-2 NaN NaN NaN ID
#sort and reset index, rename column and fill nan to empty string
df = df.sort_index().reset_index(drop=True).rename(columns={'name':'line_name'}).fillna('')
#reorder columns
df = df[['line_name','line1','line2','line3']]
print df
line_name line1 line2 line3
0 ID
1 name
2 bla 200 400 800
3 bloo 400 900 700
4 bloom 800 700 966
答案 2 :(得分:0)
df = pd.DataFrame({"line1": [200, 400, 800], "line2": [400, 900, 700], "line3": [800, 700, 966], "name": ["bla", "bloo", "bloom"]}) df.loc[-1] = [np.nan for i in range(df.shape[1] - 1) ] + ['name'] df.loc[-2] = [np.nan for i in range(df.shape[1] -1)] + ['ID'] df = df.fillna('') df=df.sort_index() df=df.reset_index() df.loc[:,['name','line1','line2','line3']]