Question

我正在尝试在Python 3.5中编写自己的函数，但没有太多运气。

我的数据框是17列，1,200行（微小）

其中一列称为“展示位置”。在此列中，我每行都包含文本。命名约定如下：

Campaign_Publisher_Site_AdType_AdSize_Device_Audience_Tactic _

以下代码完美无缺，完全符合我的需要。我只是不想为我拥有的每个数据集执行此操作：

    df_detailed = df['Placement'].str[0:-1].str.split('_', expand=True).astype(str)
    df_detailed = df.join(df_detailed)
    new_columns = *["Then i rename the columns labelled 0,1,2 etc"]*
    df_detailed.columns = new_columns
    df_detailed.head()

我正在尝试做的是构建一个函数，该函数将_作为分隔符的任何列并将其拆分为新列。

我尝试了以下内容（但不幸的是，定义我自己的功能是我非常可怕的。

def text_to_column(df):
     df_detailed = df['Placement'].str[0:-1].str.split('_', expand=True).astype(str)
     headings = df_detailed.columns  
     headings.replace(" ", "_") 
     df_detailed = df.join(df_detailed)
     df_detailed.columns = headings  
     return (df)

我收到以下错误“AttributeError：'RangeIndex'对象没有属性'replace'”

这里的最终目标是编写一个函数，我可以将列名称传递给函数，它将列中包含的值分隔为新列，然后将其连接回原始数据框。

如果我很荒谬，请告诉我。如果有人可以帮助我，我们将不胜感激。

谢谢，阿德里安

Answer 1

您需要rename函数来替换列名：

headings = df_detailed.columns  
headings.replace(" ", "_")

更改为：

df_detailed = df_detailed.rename(columns=lambda x: x.replace(" ", "_"))

或转换列to_series，因为replace不适用于index（columns names）：

headings.replace(" ", "_")

更改为：

headings = headings.to_series().replace(" ", "_")

此外：

df_detailed = df['Placement'].str[0:-1].str.split('_', expand=True).astype(str)

可能更改为：

df_detailed = df['Placement'].str.rstrip('_').str.split('_', expand=True).astype(str)

编辑：

样品：

df = pd.DataFrame({'a': [1, 2], 'Placement': ['Campaign_Publisher_Site_AdType_AdSize_Device_Audience_Tactic_', 'a_b_c_d_f_g_h_i_']})
print (df)
                                           Placement  a
0  Campaign_Publisher_Site_AdType_AdSize_Device_A...  1
1                                   a_b_c_d_f_g_h_i_  2

#input is DataFrame and column name
def text_to_column(df, col):
    df_detailed = df[col].str.rstrip('_').str.split('_', expand=True).astype(str)
    #replace columns names if necessary
    df_detailed.columns = df_detailed.columns.to_series().replace(" ", "_")
    #remove column and join new df
    df_detailed = df.drop(col, axis=1).join(df_detailed)
    return df_detailed

df = text_to_column(df, 'Placement')
print (df)
   a         0          1     2       3       4       5         6       7
0  1  Campaign  Publisher  Site  AdType  AdSize  Device  Audience  Tactic
1  2         a          b     c       d       f       g         h       i

文本到列函数

1 个答案: