我想在Python Dataframe中切片和复制列。我的数据框如下所示:
1928 1928.1 1929 1929.1 1930 1930.1
0 0 0 0 0 0 0
1 1 3 3 2 2 2
2 4 1 3 0 1 2
我想制作表格
1928 1928.1 1929 1929.1 1930 1930.1
0 0 0
1 1 3
2 4 1
3 0 0
4 3 2
5 3 0
6 0 0
7 2 2
8 1 2
这基本上意味着我想要改变列中的值' 1929' 1929.1',' 1930',' 1930.1&# 39;根据“1928年”栏目#19;和' 1928.1'
同样,我把代码编写为
[In]x=2
y=2
b=3
c=x-1
for a in range(0,2):
df.iloc[b:(b+3),0:x]=df.iloc[0:3,x:(x+y)]
x=x+2
b=b+3
[In] df
[Out]
1928 1928.1 1929 1929.1 1930 1930.1
0 0 0 0 0 0 0
1 1 3 3 2 2 2
2 4 1 3 0 1 2
列内不会进行复制。我该如何修改我的代码??
答案 0 :(得分:1)
如果你有一个新的数据帧,只需连接列:
df1 = df[['1928','1928.1']]
df2 = df[['1929','1929.1']]
df2.columns = ['1928','1928.1']
df3 = df[['1930','1930.1']]
df3.columns = ['1928','1928.1']
df = pd.concat([df1,df2,df3])
我认为这是最易读,最简单的方法。您可以覆盖原始数据框并丢弃其他数据框。
答案 1 :(得分:1)
<强> 设置 强>
cols = ['1929', '1929.1', '1930', '1930.1']
vals = df[cols].values.reshape(-1, 2)
使用 numpy.vstack
append
:
df[['1928', '1928.1']].append(
pd.DataFrame(
np.vstack([vals[::2], vals[1::2]]), columns = ['1928', '1928.1']
)
)
1928 1928.1
0 0 0
1 1 3
2 4 1
0 0 0
1 3 2
2 3 0
3 0 0
4 2 2
5 1 2
答案 2 :(得分:0)
一种方法是使用itertools.chain
:
from itertools import chain
cols = df.columns
res = pd.DataFrame({cols[0]: list(chain.from_iterable(df.iloc[:, ::2].T.values)),
cols[1]: list(chain.from_iterable(df.iloc[:, 1::2].T.values))})\
.join(pd.DataFrame(columns=cols[2:]))
print(res)
1928 1928.1 1929 1929.1 1930 1930.1
0 0 0 NaN NaN NaN NaN
1 1 3 NaN NaN NaN NaN
2 4 1 NaN NaN NaN NaN
3 0 0 NaN NaN NaN NaN
4 3 2 NaN NaN NaN NaN
5 3 0 NaN NaN NaN NaN
6 0 0 NaN NaN NaN NaN
7 2 2 NaN NaN NaN NaN
8 1 2 NaN NaN NaN NaN
答案 3 :(得分:0)
按列名称的前四个字符分组
#def key(s):
# return s[:4]
#gb = df.groupby(key, axis=1)
gb = df.groupby(by=df.columns.str[:4], axis=1)
n_cols = len(df.columns) // len(gb)
col_names = df.iloc[:,:n_cols].columns
对于每个组的DataFrame,重命名列并连接 - 这会生成一个只有两列的新DataFrame
dz = pd.concat(d.rename(columns=dict(item for item in zip(d.columns,col_names))) for g,d in gb)
dz.index = range(len(dz))
frames = []
for g,d in gb:
d.columns = col_names
frames.append(d)
dy = pd.concat(frames)
dy.index = range(len(dy))
将超过六列。
依赖具有相同列数的所有组
依赖于按标签排序的列。