我正在尝试编写一个for循环,该循环使用布尔值创建新列,这些布尔值指示所引用的两个列是否都包含True值。我希望此循环遍历现有列并进行比较,但是我不确定如何获得该循环。到目前为止,我一直在尝试使用引用不同列的列表。代码如下:
import pandas as pd
import numpy as np
elig = pd.read_excel('spreadsheet.xlsx')
elig['ELA'] = elig['SELECTED_EXAMS'].str.match('.*English Language Arts.*')
elig['LivEnv'] = elig['SELECTED_EXAMS'].str.match('.*Living Environment.*')
elig['USHist'] = elig['SELECTED_EXAMS'].str.match('.*US History.*')
elig['Geometry'] = elig['SELECTED_EXAMS'].str.match('.*Geometry.*')
elig['AlgebraI'] = elig['SELECTED_EXAMS'].str.match('.*Algebra I.*')
elig['GlobalHistory'] = elig['SELECTED_EXAMS'].str.match('.*Global History.*')
elig['Physics'] = elig['SELECTED_EXAMS'].str.match('.*Physics.*')
elig['AlgebraII'] = elig['SELECTED_EXAMS'].str.match('.*Algebra II.*')
elig['EarthScience'] = elig['SELECTED_EXAMS'].str.match('.*Earth Science.*')
elig['Chemistry'] = elig['SELECTED_EXAMS'].str.match('.*Chemistry.*')
elig['LOTE Spanish'] = elig['SELECTED_EXAMS'].str.match('.*LOTE – Spanish.*')
# CHANGE TO LOOP--enter columns for instances in which scorers overlap competencies (e.g. can score two different exams). This is helpful in the event that two exams are scored on the same day, and we need to resolve numbers of scorers.
exam_list = ['ELA','LiveEnv','USHist','Geometry','AlgebraI','GlobalHistory','Physics','AlgebraII','EarthScience','Chemistry','LOTE Spanish']
nestedExam_list = ['ELA','LiveEnv','USHist','Geometry','AlgebraI','GlobalHistory','Physics','AlgebraII','EarthScience','Chemistry','LOTE Spanish']
for exam in exam_list:
for nestedExam in nestedExam_list:
elig[exam+nestedExam+' Overlap'] = np.where((elig[exam]==True)&(elig[nestedExam]==True,),True,False)
我认为问题出在np.where()上,我想要的是考试和nestedExam调用有问题的列,但它们只是调用列表项。错误消息如下:
ValueError Traceback (most recent call last)
<ipython-input-33-9347975b8865> in <module>
3 for exam in exam_list:
4 for nestedExam in nestedExam_list:
----> 5 elig[exam+nestedExam+' Overlap'] = np.where((elig[exam]==True)&(elig[nestedExam]==True,),True,False)
6
7 """
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other)
1359
1360 res_values = na_op(self.values, other)
-> 1361 unfilled = self._constructor(res_values, index=self.index)
1362 return filler(unfilled).__finalize__(self)
1363
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
260 'Length of passed values is {val}, '
261 'index implies {ind}'
--> 262 .format(val=len(data), ind=len(index)))
263 except TypeError:
264 pass
ValueError: Length of passed values is 1, index implies 26834
有人可以帮我吗?
答案 0 :(得分:0)
首先要更有效地进行组合,并且不重复计算,我建议您使用内置库itertools。
BOCodeGMI_2
BOCodeGMI BOCodeGMI_1 BOCodeGMI_2
-----------------------------------------------------
e=01:c=KW:m=10000 01 KW
c=C-:e=01:m=10000 01 C-
c=S-:e=01:m=10000 01 S-
如果您实际上需要所有可能的订单/组合,则可以用`import itertools
exam_list = ['A', 'B', 'C', 'D']
for exam1, exam2 in itertools.combinations(exam_list, 2):
print(exam1 + '_' + exam2)
代替A_B
A_C
A_D
B_C
B_D
C_D
要处理实际的问题,实际上您需要的代码要少得多。如果您有两个均为布尔数组的列permutations
和combinations
,则两个 都为true的数组为elig[exam1]
。这称为“按位”或“逻辑与”运算。
例如:
elig[exam2]
(elig[exam1] & elig[exam2])