Question

我需要somone来帮助我理解为什么在熊猫中读取csv文件时无法更改列类型。我有一个看起来像这样的数据框：

montant CODE_NAF    select_categ
85455   0.00    6622Z   0
33643   -0.08   930G    1

，因此我确定“ montant”列是浮点型的，我保存了数据框，然后在另一个使用SKLEARN管道进行预处理的脚本中使用了它。汤姆让它工作，我必须在再次读取csv时提供类型，因此在脚本中有类似以下内容：

parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
feature_columns_names = [
'montant', 
'CODE_NAF'
] 

label_column = 'select_categ'

feature_columns_dtype = {
'montant': np.float64,
'CODE_NAF': str}
args = parser.parse_args()

# Take the set of files and read them all into a single pandas dataframe
input_files = [ os.path.join(args.train, file) for file in os.listdir(args.train) ]
if len(input_files) == 0:
    raise ValueError(('There are no files in {}.\n' +
                      'This usually indicates that the channel ({}) was incorrectly specified,\n' +
                      'the data specification in S3 was incorrectly specified or the role specified\n' +
                      'does not have permission to access the data.').format(args.train, "train"))

raw_data = [ pd.read_csv(
    file,
    header=None,engine='python',
    names=feature_columns_names + [label_column],
    dtype=merge_two_dicts(feature_columns_dtype, label_column_dtype)) for file in input_files ]
concat_data = pd.concat(raw_data)

当我尝试使用现有示例的数据时，我正在尝试重用现有示例，当我仅更改传递给脚本的csv文件时，无法将变量级联。我得到这个错误：

ValueError: Unable to convert column montant to type <class 'numpy.float64'>

当然，我尝试按原样读取csv，然后使用pd.to_numeric（）更改类型，它可以工作，但问题是该预处理脚本在读取csv时需要具有类型。这样，当新数据到达时，将在读取csv时对列进行格式化，否则将无法正常工作。首先，我很困惑为什么在脚本中读取列的类型时为什么将其类型更改为String，以及为什么可以使用示例数据以完全相同的方式更改类型。

请帮助

如何在熊猫数据框中正确更改列类型

0 个答案: