Question

编辑12/07/19：问题实际上不是pd.rename功能，而是我没有从函数中返回pandas数据框，因此列更改确实发生了打印时不存在。即

def change_column_names(as_pandas, old_name, new_name):
    as_pandas.rename(columns={old_name: new_name}, inplace=)
    return as_pandas <- This was missing*

请查看下面的用户评论，以帮助他们找到适合我的错误。

或者，您可以继续阅读。

可以从this link下载数据，但是我添加了一个样本数据集。该文件的格式不是典型的CSV文件，我认为这可能是评估文件，并且与Hidden Decision Tree文章有关。我已经给出了代码的一部分，因为它解决了如上所述的围绕文本文件格式的问题，并允许用户重命名该列。

当我尝试分配创建重命名功能时发生了问题：

def change_column_names(as_pandas, old_name, new_name):
    as_pandas.rename(columns={old_name: new_name}, inplace=)

但是，当我在重命名函数中设置变量名称时，它似乎起作用。

def change_column_names(as_pandas):
    as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'}, inplace=True)
    return as_pandas

样本数据集

Title   URL Date    Unique Pageviews
oupUrl=tutorials    18-Apr-15   5608
"An Exclusive Interview with Data Expert, John Bottega" http://www.datasciencecentral.com/forum/topics/an-exclusive-interview-with-data-expert-john-bottega?groupUrl=announcements  10-Jun-14   360
Announcing Composable Analytics http://www.datasciencecentral.com/forum/topics/announcing-composable-analytics  15-Jun-14   367
Announcing the release of Spark 1.5 http://www.datasciencecentral.com/forum/topics/announcing-the-release-of-spark-1-5  12-Sep-15   156
Are Extreme Weather Events More Frequent? The Data Science Answer   http://www.datasciencecentral.com/forum/topics/are-extreme-weather-events-more-frequent-the-data-science-answer 5-Oct-15    204
Are you interested in joining the University of California for an empiricalstudy on 'Big Data'? http://www.datasciencecentral.com/forum/topics/are-you-interested-in-joining-the-university-of-california-for-an    7-Feb-13    204
Are you smart enough to work at Google? http://www.datasciencecentral.com/forum/topics/are-you-smart-enough-to-work-at-google   11-Oct-15   3625
"As a software engineer, what's the best skill set to have for the next 5-10years?" http://www.datasciencecentral.com/forum/topics/as-a-software-engineer-what-s-the-best-skill-set-to-have-for-the-    12-Feb-16   2815
A Statistician's View on Big Data and Data Science (Updated)    http://www.datasciencecentral.com/forum/topics/a-statistician-s-view-on-big-data-and-data-science-updated-1 21-May-14   163
A synthetic variance designed for Hadoop and big data   http://www.datasciencecentral.com/forum/topics/a-synthetic-variance-designed-for-hadoop-and-big-data?groupUrl=research  26-May-14   575
A Tough Calculus Question   http://www.datasciencecentral.com/forum/topics/a-tough-calculus-question    10-Feb-16   937
Attribution Modeling: Key Analytical Strategy to Boost Marketing ROI    http://www.datasciencecentral.com/forum/topics/attribution-modeling-key-concept 24-Oct-15   937
Audience expansion  http://www.datasciencecentral.com/forum/topics/audience-expansion   6-May-13    223
Automatic use of insights   http://www.datasciencecentral.com/forum/topics/automatic-use-of-insights    27-Aug-15   122
Average length of dissertations by higher education discipline. http://www.datasciencecentral.com/forum/topics/average-length-of-dissertations-by-higher-education-discipline   4-Jun-15    1303

这是产生关键错误的完整代码： def change_column_names（as_pandas）： as_pandas.rename（columns = {'Unique Pageviews：'Page_Views'}，inplace = True）

def change_column_names(as_pandas, old_name, new_name):
    as_pandas.rename(columns={old_name: new_name}, inplace=True)


def change_column_names(as_pandas):
    as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'}, 
                               inplace=True)


def open_as_dataframe(file_name_in):
    reader = pd.read_csv(file_name_in, encoding='windows-1251')
    return reader


# Get each column of data including the heading and separate each element 
i.e. Title, URL, Date, Page Views
# and save to string_of_rows with comma separator for storage as a csv 
# file.
def get_columns_of_data(*args):
    # Function that accept variable length arguments
    string_of_rows = str()
    num_cols = len(args)
    try:
        if num_cols > 0:
            for number, element in enumerate(args):
                if number == (num_cols - 1):
                    string_of_rows = string_of_rows + element + '\n'
                else:
                    string_of_rows = string_of_rows + element + ','

    except UnboundLocalError:
        print('Empty file \'or\' No arguments received, cannot be zero')
    return string_of_rows


def open_file(file_name):
    try:
        with open(file_name) as csv_file_in, open('HDT_data5.txt', 'w') as csv_file_out:
            csv_read = csv.reader(csv_file_in,   delimiter='\t')
            for row in csv_read:
                try:
                    row[0] = row[0].replace(',', '')
                    csv_file_out.write(get_columns_of_data(*row))
                except TypeError:
                    continue

        print("The file name '{}' was successfully opened and read".format(file_name))
    except IOError:
        print('File not found \'OR\' Not in current directory\n')



# All acronyms used in variable naming correspond to the function at time 
# of return from function.
# csv_list being a list of the v file contents the remainder i.e. 'st' of 
# csv_list_st = split_title().
def main():
    open_file('HDTdata3.txt')
    multi_sets = open_as_dataframe('HDT_data5.txt')
    # change_column_names(multi_sets)
    change_column_names(multi_set, 'Old_Name', 'New_Name')
    print(multi_sets)


    main()

Answer 1

我清理了您的代码，使其可以运行。您正在更改列名，但未返回结果。请尝试以下操作：

import pandas as pd
import numpy as np
import math

def set_new_columns(as_pandas):
    titles_list = ['Year > 2014', 'Forum', 'Blog', 'Python', 'R',
                   'Machine_Learning', 'Data_Science', 'Data', 
                   'Analytics']
    for number, word in enumerate(titles_list):
        as_pandas.insert(len(as_pandas.columns), titles_list[number], 0)

def title_length(as_pandas):
    # Insert new column header then count the number of letters in 'Title'
    as_pandas.insert(len(as_pandas.columns), 'Title_Length', 0)
    as_pandas['Title_Length'] = as_pandas['Title'].map(str).apply(len)

# Although it is log, percentage of change is inverse linear comparison of 
#logX1 - logX2
# therefore you could think of it as the percentage change in Page Views 
# map
# function allows for function to be performed on all rows in column 
# 'Page_Views'.
def log_page_view(as_pandas):
    # Insert new column header
    as_pandas.insert(len(as_pandas.columns), 'Log_Page_Views', 0)
    as_pandas['Log_Page_Views'] = as_pandas['Page_Views'].map(lambda x: math.log(1 + float(x)))

def change_to_numeric(as_pandas):
    # Check for missing values then convert the column to numeric.
    as_pandas = as_pandas.replace(r'^\s*$', np.nan, regex=True)
    as_pandas['Page_Views'] = pd.to_numeric(as_pandas['Page_Views'],
                                        errors='coerce')

def change_column_names(as_pandas):
    as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'}, inplace=True)
    return as_pandas

def open_as_dataframe(file_name_in):
    reader = pd.read_csv(file_name_in, encoding='windows-1251')
    return reader

# Get each column of data including the heading and separate each element 
# i.e. Title, URL, Date, Page Views
# and save to string_of_rows with comma separator for storage as a csv 
# file.
def get_columns_of_data(*args):
    # Function that accept variable length arguments
    string_of_rows = str()
    num_cols = len(args)
    try:
        if num_cols > 0:
            for number, element in enumerate(args):
                if number == (num_cols - 1):
                    string_of_rows = string_of_rows + element + '\n'
                else:
                    string_of_rows = string_of_rows + element + ','

    except UnboundLocalError:
        print('Empty file \'or\' No arguments received, cannot be zero')
    return string_of_rows

def open_file(file_name):
    import csv
    try:
        with open(file_name) as csv_file_in, open('HDT_data5.txt', 'w') as csv_file_out:
            csv_read = csv.reader(csv_file_in,   delimiter='\t')
            for row in csv_read:
                try:
                    row[0] = row[0].replace(',', '')
                    csv_file_out.write(get_columns_of_data(*row))
                except TypeError:
                    continue

        print("The file name '{}' was successfully opened and read".format(file_name))
    except IOError:
        print('File not found \'OR\' Not in current directory\n')

# All acronyms used in variable naming correspond to the function at time 
# of return from function.
# csv_list being a list of the v file contents the remainder i.e. 'st' of 
# csv_list_st = split_title().
def main():
    open_file('HDTdata3.txt')
    multi_sets = open_as_dataframe('HDT_data5.txt')
    multi_sets = change_column_names(multi_sets)
    change_to_numeric(multi_sets)
    log_page_view(multi_sets)
    title_length(multi_sets)
    set_new_columns(multi_sets)
    print(multi_sets)


main()

pd.rename键KeyError：“ New_Name”

1 个答案: