我需要为pandas数据帧的每一行返回多个计算列。
在以下代码段中执行ValueError: Shape of passed values is (4, 2), indices imply (4, 3)
函数时出现此错误:apply
:
import pandas as pd
my_df = pd.DataFrame({
'datetime_stuff': ['2012-01-20', '2012-02-16', '2012-06-19', '2012-12-15'],
'url': ['http://www.something', 'http://www.somethingelse', 'http://www.foo', 'http://www.bar' ],
'categories': [['foo', 'bar'], ['x', 'y', 'z'], ['xxx'], ['a123', 'a456']],
})
my_df['datetime_stuff'] = pd.to_datetime(my_df['datetime_stuff'])
my_df.sort_values(['datetime_stuff'], inplace=True)
print(my_df.head())
def calculate_stuff(row):
if row['url'].startswith('http'):
categories = row['categories'] if type(row['categories']) == list else []
calculated_column_x = row['url'] + '_other_stuff_'
else:
calculated_column_x = None
another_column = 'deduction_from_fields'
return calculated_column_x, another_column
print(my_df.shape)
my_df['calculated_column_x'], my_df['another_column'] = zip(*my_df.apply(calculate_stuff, axis=1))
我正在处理的数据帧的每一行都比上面的示例更复杂,我正在应用的函数calculate_stuff
为每一行使用了许多不同的列,然后返回多个新列。
但是,上一个示例仍然提出与ValueError
数据框相关的shape
,我无法理解如何修复。
如何创建可从现有列开始计算的多个新列(每行)?
答案 0 :(得分:1)
当您从正在应用的函数返回列表或元组时,pandas
会尝试将其重新插入您运行的数据帧中。相反,返回一个系列。
重新配置代码
my_df = pd.DataFrame({
'datetime_stuff': ['2012-01-20', '2012-02-16', '2012-06-19', '2012-12-15'],
'url': ['http://www.something', 'http://www.somethingelse', 'http://www.foo', 'http://www.bar' ],
'categories': [['foo', 'bar'], ['x', 'y', 'z'], ['xxx'], ['a123', 'a456']],
})
my_df['datetime_stuff'] = pd.to_datetime(my_df['datetime_stuff'])
my_df.sort_values(['datetime_stuff'], inplace=True)
def calculate_stuff(row):
if row['url'].startswith('http'):
categories = row['categories'] if type(row['categories']) == list else []
calculated_column_x = row['url'] + '_other_stuff_'
else:
calculated_column_x = None
another_column = 'deduction_from_fields'
# I changed this VVVV
return pd.Series((calculated_column_x, another_column), ['calculated_column_x', 'another_column'])
my_df.join(my_df.apply(calculate_stuff, axis=1))
categories datetime_stuff url calculated_column_x another_column
0 [foo, bar] 2012-01-20 http://www.something http://www.something_other_stuff_ deduction_from_fields
1 [x, y, z] 2012-02-16 http://www.somethingelse http://www.somethingelse_other_stuff_ deduction_from_fields
2 [xxx] 2012-06-19 http://www.foo http://www.foo_other_stuff_ deduction_from_fields
3 [a123, a456] 2012-12-15 http://www.bar http://www.bar_other_stuff_ deduction_from_fields