连接多列pandas数据帧,包括几列中的布尔值

时间:2017-07-26 16:21:14

标签: python pandas

我是python的新手。在我的项目中,我需要连接pandas数据框的多个列以创建派生列。我的数据框包含几列只有TRUE& FALSE值。我使用以下代码进行连接操作

df_input["combined"] = [' '.join(row) for row in df_input[df_input.columns[0:]].values]

我在运行代码时遇到以下错误

TypeError: sequence item 3: expected str instance, bool found

您能否请专家帮我解决问题?

先谢谢

2 个答案:

答案 0 :(得分:2)

让我们试试astype

df_input["combined"] = [' '.join(row.astype(str)) for row in df_input[df_input.columns[0:]].values]

答案 1 :(得分:1)

您可以使用Bool转换astype(str)列,并使用矢量化版本来连接列,如下所示

from StringIO import StringIO
import pandas as pd

st = """
col1|col2|col3
1|hello|True
4|world|False
7|!|True
"""
df = pd.read_csv(StringIO(st), sep="|")

print("my sample dataframe")
print(df.head())

print("current columns data types")
print(df.dtypes)

print("combining all columns with mixed datatypes") 
df["combined"] = df["col1"].astype(str)+" "+df["col2"]+ " " +df["col3"].astype(str)

print("here's how the data looks now")
print(df.head())

print("here are the new columns datatypes")
print(df.dtypes)

脚本的输出:

my sample dataframe
   col1   col2   col3
0     1  hello   True
1     4  world  False
2     7      !   True
current columns data types
col1     int64
col2    object
col3      bool
dtype: object
combining all columns with mixed datatypes
here's how the data looks now
   col1   col2   col3       combined
0     1  hello   True   1 hello True
1     4  world  False  4 world False
2     7      !   True       7 ! True
here are the new columns datatypes
col1         int64
col2        object
col3          bool
combined    object
dtype: object

正如您所看到的,新的combined包含连接数据。

动态连接

要动态执行连接,以下是编辑上一个示例的方法:

from StringIO import StringIO
import pandas as pd

st = """
col1|col2|col3
1|hello|True
4|world|False
7|!|True
"""
df = pd.read_csv(StringIO(st), sep="|")

print("my sample dataframe")
print(df.head())

print("current columns data types")
print(df.dtypes)

print("combining all columns with mixed datatypes") 
#df["combined"] = df["col1"].astype(str)+" "+df["col2"]+ " " +df["col3"].astype(str)

all_columns = list(df.columns) 
df["combined"] = "" 

for index, column_name in enumerate(all_columns):
    print("current column {column_name}".format(column_name=column_name))
    df["combined"] = df["combined"] + " " +df[column_name].astype(str)

print("here's how the data looks now")
print(df.head())

print("here are the new columns datatypes")
print(df.dtypes)