我正在将一个系列连接到一个数据帧,但是新的数据帧上没有显示列名(系列名称)。
相反,该列在最终数据框中的名称为“0”,但是当它在apply_join方法中生成时,名称会显示出来。
为什么在数据框中看不到系列名称?
import pandas as pd
from io import StringIO
tibble3_csv = """country,year,cases,population
Afghanistan,1999,745,19987071
Afghanistan,2000,2666,20595360"""
with StringIO(tibble3_csv) as fp:
tibble3 = pd.read_csv(fp)
def str_join_elements(x, sep=""):
assert type(sep) is str
return sep.join((str(xi) for xi in x))
def unite(df, cols, new_var, combine=str_join_elements):
def apply_join(x, combine):
joinstr = combine(x)
ser = pd.Series(joinstr, name=new_var)
print(ser.name)
return ser
fixed_vars = df.columns.difference(cols)
tibble = df[fixed_vars].copy()
tibble_extra = df[cols].apply(apply_join, combine=combine, axis=1)
return pd.concat([tibble, tibble_extra], axis=1)
tab = unite(tibble3, ['cases', 'population'], 'rate', combine=lambda x: str_join_elements(x, "/"))
print(tab)
结果:
rate
rate
country year 0
0 Afghanistan 1999 745/19987071
1 Afghanistan 2000 2666/20595360
答案 0 :(得分:0)
如果您尝试连接未知数量的列,则可以apply
使用str.join
:
def foo(df, columns, col_name, sep=''):
s = df[columns].apply(lambda x: sep.join(map(str, x)), 1)
s.name = col_name
return pd.concat([df[df.columns.difference(columns)], s], axis=1)
df
country year cases population
0 Afghanistan 1999 745 19987071
1 Afghanistan 2000 2666 20595360
df2 = foo(df, ['cases', 'population'], 'rate', '/')
df2
country year rate
0 Afghanistan 1999 745/19987071
1 Afghanistan 2000 2666/20595360
如果它总是两列,您可以使用str.cat
,它会更快。
def foo2(df, c1, c2, c3, sep=''):
s1, s2 = df[c1].astype(str), df[c2].astype(str)
s3 = s1.str.cat(s2, sep=sep)
s3.name = c3
return pd.concat([df[df.columns.difference([c1, c2])], s3], axis=1)
df2 = foo2(df, 'cases', 'population', 'rate', '/')
df2
country year rate
0 Afghanistan 1999 745/19987071
1 Afghanistan 2000 2666/20595360
答案 1 :(得分:0)
您也可以尝试使用
重命名列>>> tab = tab.rename(columns = {0:'cases/population'})
>>> tab
country year cases/population
0 Afghanistan 1999 745/19987071
1 Afghanistan 2000 2666/20595360
>>>