逐行编辑pandas数据帧

时间:2013-12-19 21:33:50

标签: python pandas trial

python的熊猫很整洁。我正在尝试用pandas-dataframe替换字典列表。但是,我想知道有一种方法可以在for循环中逐行更改值吗?

这是非熊猫dict-version:

trialList = [
    {'no':1, 'condition':2, 'response':''},
    {'no':2, 'condition':1, 'response':''},
    {'no':3, 'condition':1, 'response':''}
]  # ... and so on

for trial in trialList:
    # Do something and collect response
    trial['response'] = 'the answer!'

...现在trialList包含更新后的值,因为trial引用了这一点。非常便利!但是这些名单是非常不方便的,特别是因为我希望能够以列大小的方式计算大熊猫擅长的东西。

所以从上面给出了trialList,虽然我可以通过做一些类似熊猫的事情来做得更好:

import pandas as pd    
dfTrials = pd.DataFrame(trialList)  # makes a nice 3-column dataframe with 3 rows

for trial in dfTrials.iterrows():
   # do something and collect response
   trials[1]['response'] = 'the answer!'

...但trialList在这里保持不变。有没有一种简单的方法可以逐行更新值,也许等同于dict-version?重要的是它是逐行的,因为这是一个实验,参与者被提交了大量的试验,每个单独的试验收集了各种数据。

1 个答案:

答案 0 :(得分:36)

如果您真的想要逐行操作,可以使用iterrowsloc

>>> for i, trial in dfTrials.iterrows():
...     dfTrials.loc[i, "response"] = "answer {}".format(trial["no"])
...     
>>> dfTrials
   condition  no  response
0          2   1  answer 1
1          1   2  answer 2
2          1   3  answer 3

[3 rows x 3 columns]

更好的是,当你可以矢量化时:

>>> dfTrials["response 2"] = dfTrials["condition"] + dfTrials["no"]
>>> dfTrials
   condition  no  response  response 2
0          2   1  answer 1           3
1          1   2  answer 2           3
2          1   3  answer 3           4

[3 rows x 4 columns]

总是apply

>>> def f(row):
...     return "c{}n{}".format(row["condition"], row["no"])
... 
>>> dfTrials["r3"] = dfTrials.apply(f, axis=1)
>>> dfTrials
   condition  no  response  response 2    r3
0          2   1  answer 1           3  c2n1
1          1   2  answer 2           3  c1n2
2          1   3  answer 3           4  c1n3

[3 rows x 5 columns]