如何使用for循环基于其他列中的多个条件创建新列?

时间:2019-03-08 21:47:19

标签: python pandas list loops

我正在尝试编写一个for循环,该循环使用布尔值创建新列,这些布尔值指示所引用的两个列是否都包含True值。我希望此循环遍历现有列并进行比较,但是我不确定如何获得该循环。到目前为止,我一直在尝试使用引用不同列的列表。代码如下:

import pandas as pd
import numpy as np

elig = pd.read_excel('spreadsheet.xlsx')

elig['ELA'] = elig['SELECTED_EXAMS'].str.match('.*English Language Arts.*')
elig['LivEnv'] = elig['SELECTED_EXAMS'].str.match('.*Living Environment.*')
elig['USHist'] = elig['SELECTED_EXAMS'].str.match('.*US History.*')
elig['Geometry'] = elig['SELECTED_EXAMS'].str.match('.*Geometry.*')
elig['AlgebraI'] = elig['SELECTED_EXAMS'].str.match('.*Algebra I.*')
elig['GlobalHistory'] = elig['SELECTED_EXAMS'].str.match('.*Global History.*')
elig['Physics'] = elig['SELECTED_EXAMS'].str.match('.*Physics.*')
elig['AlgebraII'] = elig['SELECTED_EXAMS'].str.match('.*Algebra II.*')
elig['EarthScience'] = elig['SELECTED_EXAMS'].str.match('.*Earth Science.*')
elig['Chemistry'] = elig['SELECTED_EXAMS'].str.match('.*Chemistry.*')
elig['LOTE Spanish'] = elig['SELECTED_EXAMS'].str.match('.*LOTE – Spanish.*')

# CHANGE TO LOOP--enter columns for instances in which scorers overlap competencies (e.g. can score two different exams). This is helpful in the event that two exams are scored on the same day, and we need to resolve numbers of scorers.

exam_list = ['ELA','LiveEnv','USHist','Geometry','AlgebraI','GlobalHistory','Physics','AlgebraII','EarthScience','Chemistry','LOTE Spanish']
nestedExam_list = ['ELA','LiveEnv','USHist','Geometry','AlgebraI','GlobalHistory','Physics','AlgebraII','EarthScience','Chemistry','LOTE Spanish']

for exam in exam_list:
    for nestedExam in nestedExam_list:
        elig[exam+nestedExam+' Overlap'] = np.where((elig[exam]==True)&(elig[nestedExam]==True,),True,False)

我认为问题出在np.where()上,我想要的是考试和nestedExam调用有问题的列,但它们只是调用列表项。错误消息如下:


ValueError                                Traceback (most recent call last)
<ipython-input-33-9347975b8865> in <module>
      3 for exam in exam_list:
      4     for nestedExam in nestedExam_list:
----> 5         elig[exam+nestedExam+' Overlap'] = np.where((elig[exam]==True)&(elig[nestedExam]==True,),True,False)
      6 
      7 """

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other)
   1359 
   1360             res_values = na_op(self.values, other)
-> 1361             unfilled = self._constructor(res_values, index=self.index)
   1362             return filler(unfilled).__finalize__(self)
   1363 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    260                             'Length of passed values is {val}, '
    261                             'index implies {ind}'
--> 262                             .format(val=len(data), ind=len(index)))
    263                 except TypeError:
    264                     pass

ValueError: Length of passed values is 1, index implies 26834

有人可以帮我吗?

1 个答案:

答案 0 :(得分:0)

首先要更有效地进行组合,并且不重复计算,我建议您使用内置库itertools

BOCodeGMI_2
BOCodeGMI            BOCodeGMI_1          BOCodeGMI_2 
-----------------------------------------------------
e=01:c=KW:m=10000    01                   KW
c=C-:e=01:m=10000    01                   C-
c=S-:e=01:m=10000    01                   S-

如果您实际上需要所有可能的订单/组合,则可以用`import itertools exam_list = ['A', 'B', 'C', 'D'] for exam1, exam2 in itertools.combinations(exam_list, 2): print(exam1 + '_' + exam2) 代替A_B A_C A_D B_C B_D C_D

要处理实际的问题,实际上您需要的代码要少得多。如果您有两个均为布尔数组的列permutationscombinations,则两个 都为true的数组为elig[exam1]。这称为“按位”或“逻辑与”运算。

例如:

elig[exam2]
(elig[exam1] & elig[exam2])