我的数据框如下所示
include
我需要让它看起来像:
col1 col2 col3
0 [1, a] [1, a1] [1, a2]
1 [2, b] [2, b1] [2, b2]
2 [3, c] [3, c1] [3, c2]
我的代码
col1 col2 col3 col4
0 a a1 a2 1
1 b b1 b2 2
2 c c1 c2 3
到目前为止,我已经尝试使用 apply(pd.Series) 并通过 for 循环迭代来重新分配值,但没有成功
答案 0 :(得分:5)
这是使用 applymap
和 map
的一种方法:
df.applymap(lambda x: x[-1]).assign(col4 = df['col1'].map(lambda x: x[0]))
答案 1 :(得分:2)
基于评论(行中所有列的第一个值相同):
print(
df.apply(lambda x: [v[1] for v in x] + [x[0][0]], axis=1)
.apply(pd.Series)
.rename(columns=lambda x: "col{}".format(x + 1))
)
打印:
col1 col2 col3 col4
0 a a1 a2 1
1 b b1 b2 2
2 c c1 c2 3
或者:
df = pd.concat(
[
df.transform(lambda x: [v[1] for v in x], axis=1),
df.apply(lambda x: x[0][0], axis=1).rename("col4"),
],
axis=1,
)
print(df)
打印:
col1 col2 col3 col4
0 a a1 a2 1
1 b b1 b2 2
2 c c1 c2 3
答案 2 :(得分:1)
您可以使用熊猫的字符串方法来访问值:
(df.assign(col1 = df.col1.str[-1],
col2 = df.col2.str[-1],
col3 = df.col3.str[-1],
col4 = df.col1.str[0])
)
col1 col2 col3 col4
0 a a1 a2 1
1 b b1 b2 2
2 c c1 c2 3
您可以使用字典理解使其更通用:
result = {col : df[col].str[-1] for col in df}
col4 = df.col1.str[0]
df.assign(**result, col4 = col4)
col1 col2 col3 col4
0 a a1 a2 1
1 b b1 b2 2
2 c c1 c2 3
你可以很好地将它转储到 python 中并创建一个新的数据框:
outcome = {key: [ent[-1] for ent in value]
for key, value in df.items()}
col4 = {'col4' : [value[-0] for value in df.col1]}
outcome = outcome | col4 # python 3.9, for earlier {**outcome, **col4}
pd.DataFrame(outcome)
col1 col2 col3 col4
0 a a1 a2 1
1 b b1 b2 2
2 c c1 c2 3
答案 3 :(得分:1)
一个笨拙的解决方案:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'col1': {0: [1, 'a'], 1: [2, 'b'], 2: [3, 'c']},
'col2': {0: [1, 'a1'], 1: [2, 'b1'], 2: [3, 'c1']},
'col3': {0: [1, 'a2'], 1: [2, 'b2'], 2: [3, 'c2']}
})
a = np.array(df.values.tolist())
new_df = pd.DataFrame(
np.concatenate((a[..., 1], a[:, 0, 0, None]), axis=1),
columns=[*df.columns, 'col4']
)
print(new_df)
new_df
:
col1 col2 col3 col4
0 a a1 a2 1
1 b b1 b2 2
2 c c1 c2 3
通过 perfplot 的一些时间信息:
import numpy as np
import pandas as pd
import perfplot
def gen_data(n):
df = pd.DataFrame(
{'col1': [[1, 'a']],
'col2': [[1, 'a1']],
'col3': [[1, 'a2']]},
)
df = df.loc[np.repeat(df.index.values, n)]
return df
def applymap(df):
return df.applymap(lambda x: x[-1]).assign(
col4=df['col1'].map(lambda x: x[0]))
def apply_series(df):
return df.apply(lambda x: [v[1] for v in x] + [x[0][0]], axis=1) \
.apply(pd.Series) \
.rename(columns=lambda x: "col{}".format(x + 1))
def pd_concat(df):
return pd.concat(
[
df.transform(lambda x: [v[1] for v in x], axis=1),
df.apply(lambda x: x[0][0], axis=1).rename("col4"),
],
axis=1,
)
def str_accessors(df):
return df.assign(col1=df.col1.str[-1],
col2=df.col2.str[-1],
col3=df.col3.str[-1],
col4=df.col1.str[0])
def str_accessors_generic(df):
result = {col: df[col].str[-1] for col in df}
col4 = df.col1.str[0]
return df.assign(**result, col4=col4)
def dump_into_python(df):
outcome = {key: [ent[-1] for ent in value]
for key, value in df.items()}
col4 = {'col4': [value[-0] for value in df.col1]}
outcome = outcome | col4
return pd.DataFrame(outcome)
def numpy_sol(df):
a = np.array(df.values.tolist())
return pd.DataFrame(
np.concatenate((a[..., 1], a[:, 0, 0, None]), axis=1),
columns=[*df.columns, 'col4']
)
if __name__ == '__main__':
out = perfplot.bench(
setup=gen_data,
kernels=[
applymap,
apply_series,
pd_concat,
str_accessors,
str_accessors_generic,
dump_into_python,
numpy_sol
],
labels=[
'applymap_map (rhug123)',
'apply_series (Andrej Kesely)',
'pd_concat (Andrej Kesely)',
'str_accessors (sammywemmy)',
'str_accessors_generic (sammywemmy)',
'dump_into_python (sammywemmy)',
'numpy_sol (Henry Ecker)',
],
n_range=[2 ** k for k in range(18)],
equality_check=None
)
out.save('perfplot_results.png', transparent=False)