time_period total_cost total_revenue
7days 150 250
14days 350 600
30days 900 750
7days 180 400
14days 430 620
根据这些数据,我想将total_cost和total_revenue列转换为给定时间段的平均值。我认为这样可行:
df[['total_cost','total_revenue']][df.time_period]=="7days"]=df[['total_cost','total_revenue']][df.time_period]=="7days"]/7
但它会使数据帧保持不变。
答案 0 :(得分:3)
我相信您正在使用数据框的副本。我认为你应该使用apply
:
from StringIO import StringIO
import pandas
datastring = StringIO("""\
time_period total_cost total_revenue
7days 150 250
14days 350 600
30days 900 750
7days 180 400
14days 430 620
""")
data = pandas.read_table(datastring, sep='\s\s+')
data['total_cost_avg'] = data.apply(
lambda row: row['total_cost'] / float(row['time_period'][:-4]),
axis=1
)
给了我:
time_period total_cost total_revenue total_cost_avg
0 7days 150 250 21.428571
1 14days 350 600 25.000000
2 30days 900 750 30.000000
3 7days 180 400 25.714286
4 14days 430 620 30.714286
答案 1 :(得分:2)
保罗的优秀答案。在这里添加我的方法
test_df = pd.read_csv("file1.csv")
test_df
time_period total_cost total_revenue
0 7days 150 250
1 14days 350 600
2 30days 900 750
3 7days 180 400
4 14days 430 620
test_df['days'] = test_df.time_period.str.extract('(\d*)days').apply(int)
test_df['total_cost'] = test_df.total_cost / test_df.days
test_df['total_revenue'] = test_df.total_revenue / test_df.days
del test_df['days']
test_df
time_period total_cost total_revenue
0 7days 21.428571 35.714286
1 14days 25.000000 42.857143
2 30days 30.000000 25.000000
3 7days 25.714286 57.142857
4 14days 30.714286 44.285714