Question

我有一个来自csv文件的熊猫数据帧，我需要取3列的平均值，并在新列中添加结果。数据就是这样-

0      week     12    exp    exp    exp
1   Subject  Group      1      2      3
2       255   HD 0  117.4  104.8   87.0
3       418   WT 0   61.2   56.1   97.9
4       300   HD 0  111.7  126.9  118.4
5       299   HD 0   50.7   37.8   30.6
6       258   WT 0   56.0   67.9   58.5
7       173   HD 0   76.2  131.7  119.5

我的代码是-

with open('final results.csv', 'r') as frame:
    date_again = csv.reader(frame)   

    frame = []
    for line in date_again:
        frame = frame + [line]
panda_file = pd.DataFrame(frame)

panda_file ['average'] = frame [3：]。mean（axis = 1）

我得到的错误是 AttributeError：“列表”对象没有属性“均值”

我该如何解决？

谢谢

Answer 1

首先创建document时将read_csv与参数Option Explicit Public Sub MakeStateSelection() Dim ie As New InternetExplorer, html As HTMLDocument With ie .Visible = True .navigate "https://tools.usps.com/go/ZipLookupAction!input.action?mode=1&refresh=true" While .Busy Or .READYSTATE < 4: DoEvents: Wend Set html = .document html.querySelector("#zip-lookup-app > div > div:nth-child(1) > div > ul > li:nth-child(1) > a > span").Click While .Busy Or .READYSTATE < 4: DoEvents: Wend html.querySelector("#tState option[value='MA']").Selected = True 'other code Stop '<== Delete me '.Quit '<== Uncomment me End With End Sub一起使用，因为csv在DataFrame的列中有header=[0,1]有2行标题：

DataFrame

然后为MultiIndex选择最后3列：

import pandas as pd

temp=u"""week,12,exp,exp,exp
Subject,Group,1,2,3
255,HD,0,117.4,104.8,87.0
418,WT,0,61.2,56.1,97.9
300,HD,0,111.7,126.9,118.4
299,HD,0,50.7,37.8,30.6
258,WT,0,56.0,67.9,58.5
173,HD,0,76.2,131.7,119.5"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=[0,1])

print (df)
       week    12    exp              
    Subject Group      1      2      3
255      HD     0  117.4  104.8   87.0
418      WT     0   61.2   56.1   97.9
300      HD     0  111.7  126.9  118.4
299      HD     0   50.7   37.8   30.6
258      WT     0   56.0   67.9   58.5
173      HD     0   76.2  131.7  119.5

对于新列，需要分配给元组新MultiIndex列的定义名称：

mean

但为简化起见，可以使用扁平列：

df1 = df.iloc[:, -3:].mean(axis=1)
print (df1)
255    103.066667
418     71.733333
300    119.000000
299     39.700000
258     60.800000
173    109.133333
dtype: float64

尝试对PANDA数据帧取平均值时出错

1 个答案: