在pandas数据帧中复制列

时间:2018-05-25 16:34:09

标签: python python-3.x python-2.7 pandas dataframe

我想在Python Dataframe中切片和复制列。我的数据框如下所示:

     1928  1928.1  1929  1929.1  1930  1930.1
 0    0     0       0     0       0     0
 1    1     3       3     2       2     2
 2    4     1       3     0       1     2

我想制作表格

     1928  1928.1  1929 1929.1 1930 1930.1
 0   0     0            
 1   1     3          
 2   4     1                    
 3   0     0
 4   3     2
 5   3     0
 6   0     0
 7   2     2
 8   1     2 

这基本上意味着我想要改变列中的值' 1929' 1929.1',' 1930',' 1930.1&# 39;根据“1928年”栏目#19;和' 1928.1'

同样,我把代码编写为

   [In]x=2
       y=2
       b=3
       c=x-1
       for a in range(0,2):
            df.iloc[b:(b+3),0:x]=df.iloc[0:3,x:(x+y)]
            x=x+2
            b=b+3
   [In] df
   [Out] 
     1928  1928.1  1929  1929.1  1930  1930.1
 0    0     0       0     0       0     0
 1    1     3       3     2       2     2
 2    4     1       3     0       1     2

列内不会进行复制。我该如何修改我的代码??

4 个答案:

答案 0 :(得分:1)

如果你有一个新的数据帧,只需连接列:

df1 = df[['1928','1928.1']]
df2 = df[['1929','1929.1']]
df2.columns = ['1928','1928.1']
df3 = df[['1930','1930.1']]
df3.columns = ['1928','1928.1']

df = pd.concat([df1,df2,df3])

我认为这是最易读,最简单的方法。您可以覆盖原始数据框并丢弃其他数据框。

答案 1 :(得分:1)

<强> 设置

cols = ['1929', '1929.1', '1930', '1930.1']
vals = df[cols].values.reshape(-1, 2)
使用 numpy.vstack

append

df[['1928', '1928.1']].append(
    pd.DataFrame(
        np.vstack([vals[::2], vals[1::2]]), columns = ['1928', '1928.1']
    )
)

   1928  1928.1
0     0       0
1     1       3
2     4       1
0     0       0
1     3       2
2     3       0
3     0       0
4     2       2
5     1       2

答案 2 :(得分:0)

一种方法是使用itertools.chain

from itertools import chain

cols = df.columns

res = pd.DataFrame({cols[0]: list(chain.from_iterable(df.iloc[:, ::2].T.values)),
                    cols[1]: list(chain.from_iterable(df.iloc[:, 1::2].T.values))})\
        .join(pd.DataFrame(columns=cols[2:]))

print(res)

   1928  1928.1 1929 1929.1 1930 1930.1
0     0       0  NaN    NaN  NaN    NaN
1     1       3  NaN    NaN  NaN    NaN
2     4       1  NaN    NaN  NaN    NaN
3     0       0  NaN    NaN  NaN    NaN
4     3       2  NaN    NaN  NaN    NaN
5     3       0  NaN    NaN  NaN    NaN
6     0       0  NaN    NaN  NaN    NaN
7     2       2  NaN    NaN  NaN    NaN
8     1       2  NaN    NaN  NaN    NaN

答案 3 :(得分:0)

按列名称的前四个字符分组

#def key(s):
#    return s[:4]
#gb = df.groupby(key, axis=1)
gb = df.groupby(by=df.columns.str[:4], axis=1)

n_cols = len(df.columns) // len(gb)
col_names = df.iloc[:,:n_cols].columns

对于每个组的DataFrame,重命名列并连接 - 这会生成一个只有两列的新DataFrame

dz = pd.concat(d.rename(columns=dict(item for item in zip(d.columns,col_names))) for g,d in gb)
dz.index = range(len(dz))
frames = []
for g,d in gb:
    d.columns = col_names
    frames.append(d)
dy = pd.concat(frames)
dy.index = range(len(dy))

将超过六列。
依赖具有相同列数的所有组 依赖于按标签排序的列。