Question

首先，我发现了类似的文章，但我还没有弄清楚如何将这些问题的答案翻译成我自己的问题。其次，我是python的新手，所以我为成为菜鸟而道歉。

这是我的问题：我想对文本文件中的值执行条件计算（平均/比例/等等）

更具体地说，我有一个看起来像下面的文件

0    Diamond    Correct
0    Cross      Incorrect
1    Diamond    Correct
1    Cross      Correct

到目前为止，我能够读入文件并收集所有行。

import pandas as pd
fileLocation = r'C:/Users/Me/Desktop/LogFiles/SubjectData.txt'
df = pd.read_csv(fileLocation, header = None, sep='\t', index_col = False,
                 name = ["Session Number", "Image", "Outcome"])

我希望查询该文件，以便我可以提出以下问题：

- ＆＃34;正确＆＃34;的比例是多少？＆＃39;结果＆＃39;中的值第一列（＆＃39;会话号＆＃39;）为0时的列？所以这将是0.5，因为有一个＆＃34;正确＆＃34;和一个＆＃34;不正确＆＃34;。

我还有其他想要表演的计算，但是一旦我知道如何做到这一点，我应该能够找出去哪里，希望简单，命令。

谢谢！

Answer 1

# getting the total number of rows
total = len(df)  

# getting the number of rows that have 'Correct' for 'Outcome' and 0 for 'Session Number'
correct_and_session_zero = len(df[(df['Outcome'] == 'Correct') & 
                                  (df['Session Number'] == 0)])

# if you're using python 2 you might need to convert correct_and_session_zero  or total
# to float so you won't lose precision
print(correct_and_session_zero / total)

Answer 2

你也可以这样做：

In [467]: df.groupby('Session#')['Outcome'].apply(lambda x: (x == 'Correct').sum()/len(x))
Out[467]:
Session#
0    0.5
1    1.0
Name: Outcome, dtype: float64

它会根据Session#对您的DF进行分组，并为每个组计算Ratio of correct Outcomes（Session#）

有条件的Sum / Average / etc ... Python文件中的CSV文件

2 个答案: