快速回答

Question

我试图遍历两个python数据帧列以确定特定值，然后将结果添加到新列。下面的代码抛出以下错误：

raise ValueError('Length of values does not match length of ' 'index')"

我不确定为什么？

数据帧：

    TeamID    todayorno
1   sw        True
2   pr        False
3   sw        False
4   pr        True

代码：

team = []

for row in results['TeamID']:   
    if row == "sw":
        for r in results['todayorno']:
            if r == True:
                team.append('red')
            else:
                team.append('green')
    else:
        team.append('green')

results['newnew'] = team

Answer 1

您正在迭代数据帧两次，表明您有2个for循环。最终得到10个项目的结果，而不是所需的4个项目。

不需要显式迭代。您可以使用numpy.select为指定条件应用值。

import numpy as np

mask = results['TeamID'] == 'sw'
conditions = [~mask, mask & results['todayorno'], mask & ~results['todayorno']]
values = ['green', 'red', 'green']

results['newnew'] = np.select(conditions, values, 'green')

print(results)

  TeamID  todayorno newnew
1     sw       True    red
2     pr      False  green
3     sw      False  green
4     pr       True  green

Answer 2

快速回答

不要试图循环。

相反，使用默认值（即最常见的值）创建新列，然后解决您要更改的值并设置它们：

>>> results
  TeamID  todayorno
0     sw       True
1     pr      False
2     sw      False
3     pr       True
>>> results['newnew'] = 'green'
>>> results
  TeamID  todayorno newnew
0     sw       True  green
1     pr      False  green
2     sw      False  green
3     pr       True  green
>>> results.loc[(results['TeamID'] == 'sw') & (results['todayorno']), 'newnew'] = 'red'
>>> results
  TeamID  todayorno newnew
0     sw       True    red
1     pr      False  green
2     sw      False  green
3     pr       True  green

或者，您可以使用.apply(..., index=1)计算整个系列，其中包含查看每一行的函数，并将整个系列一次性分配为一列：

>>> results
  TeamID  todayorno
0     sw       True
1     pr      False
2     sw      False
3     pr       True
>>> results['newnew'] = results.apply(
...     lambda s: 'red' if s['TeamID'] == 'sw' and s['todayorno'] else 'green',
...     axis=1,
... )
>>> results
  TeamID  todayorno newnew
0     sw       True    red
1     pr      False  green
2     sw      False  green
3     pr       True  green

解释

问题

据我所知，您可以尝试在名为newnew的数据框中添加列。

在TeamID列包含值"sw"且列todayorno包含值True的数据框的行中，您需要列{{1} }包含值newnew。

在所有其他行中，您希望"red"的值为newnew。

规则

为了有效地使用大熊猫，一个非常重要的规则是：不要尝试循环。特别是通过行。

而是让熊猫为你做的工作。

因此，第一步是创建新列。因为在大多数情况下，您希望值为"green"，您只需执行以下操作：

"green"

现在您的数据框架如下：

results['newnew'] = 'green'

你会注意到大熊猫＆＃34;扩大了＃34;通过所有行提供的单个值。

现在要将TeamID todayorno newnew 0 sw True green 1 pr False green 2 sw False green 3 pr True green行设为sw/True，首先需要找到它们。为此，我们需要了解大熊猫如何解决问题。

（一点点）pandas寻址如何工作

在pandas数据帧之后使用方括号时，通常会对数据帧的列进行寻址。例如：

"red"

即。通过请求>>> results['TeamID'] 0 sw 1 pr 2 sw 3 pr Name: TeamID, dtype: object数据框的TeamID索引，您返回了一个名为results的{{1}}，其中只包含该列的值。

另一方面，如果要处理行，则需要使用Series属性。

TeamID

我们在这里找回了包含该行值的.loc。

如果我们想要查看多行，我们可以通过索引行列表来获取子数据帧：

>>> results.loc[1]
TeamID          pr
todayorno    False
newnew       green
Name: 1, dtype: object

或者使用条件：

Series

条件可以包含布尔组合，但其语法有特殊要求，例如使用>>> results.loc[[1,2]] TeamID todayorno newnew 1 pr False green 2 sw False green而不是>>> results.loc[results['TeamID'] == 'pr'] TeamID todayorno newnew 1 pr False green 3 pr True green，并且由于{的优先级，用括号小心地包装条件的各个部分{1}}运营商：

and属性也可以按行和列进行寻址。逗号分隔寻址部分，其中行的寻址首先出现，列最后：

最后的触摸

>>> results.loc[(results['TeamID'] == 'sw') & (results['todayorno'])] TeamID todayorno newnew 1 sw True green属性也可用于分配，方法是将所需的值分配到所需的＆＃34;坐标＆＃34;。

所以在你的情况下：

.loc

另一种解决方案

数据帧的>>> results.loc[results['TeamID'] == 'pr', 'todayorno'] 1 False 3 True Name: todayorno, dtype: bool方法允许多次应用单个函数，无论是按列还是按行。要逐行应用，请传递.loc参数。

如果传递给>>> results.loc[ ... (results['TeamID'] == 'sw') & (results['todayorno']), ... 'newnew' ... ] = "red" >>> results TeamID todayorno newnew 0 sw True red 1 pr False green 2 sw False green 3 pr True green的函数的结果返回单个值，那么函数的每个应用程序的结果将在具有相同寻址的系列中组合（相同）数据集行的 index ，用熊猫的说法）。

所以：

.apply()

然后可以将其指定为数据帧的列：

axis=1

如何使用python2.7使用嵌套for循环迭代数据框并附加到新的数据帧列？

2 个答案:

快速回答

解释

问题

规则

（一点点）pandas寻址如何工作

最后的触摸

另一种解决方案