为pandas中的另一列的每个值替换一列的值

时间:2017-11-22 11:07:56

标签: python pandas

我有一个csv文件如下所示:

ID,Number,Value
61745,three,11
61745,one,11
61745,one & two,12
61745,two,13
61743,one,41
61743,two,42
61741,one,21
61741,one & two,22
61715,one,31
61715,two,32
61715,three,33

我想要实现的目标:

对于每个 ID ,如果 数字 列包含"一个&两个",我想要包含"两个"的所有 Number 列字段或者"一个"将被替换为" one& 2"值。例如,对于" 61745" ID我可以看到" one& 2"价值至少一次。但对于" 61743" ID我看不到这个值。所以,我想返回以下内容:

ID,Number,Value
61745,three,11
61745,one & two,11
61745,one & two,12
61745,one & two,13
61743,one,41
61743,two,42
61741,one & two,21
61741,one & two,22
61715,one,31
61715,two,32
61715,three,33

到目前为止,我已经尝试过这个:

import pandas as pd
import os
import csv
import time
import dateutil.parser as dparser
import datetime

df = pd.read_csv("slack.csv")

for row in df.itertuples():
    if row[2] == "one & two":
        df.ix[df.Number.isin(['one & two','one','two']), 'Number'] = 'one & two'

结果是脚本替换了所有" 2"和"一个"每个 ID 数字 列中的值:

       ID     Number  Value
0   61745      three        11
1   61745  one & two        11
2   61745  one & two        12
3   61745  one & two        13
4   61743  one & two        41
5   61743  one & two        42
6   61741  one & two        21
7   61741  one & two        22
8   61715  one & two        31
9   61715  one & two        32
10  61715  pinterest        33

2 个答案:

答案 0 :(得分:1)

使用groupby自定义功能,检查至少有一个值是one & two,然后是replace dict

def f(x):
    d = {'one':'one & two', 'two':'one & two'}
    if x.eq('one & two').any():
        return x.replace(d)
    else:
        return x

df['Number'] = df.groupby('ID')['Number'].apply(f)
print (df)
       ID     Number  Value
0   61745      three     11
1   61745  one & two     11
2   61745  one & two     12
3   61745  one & two     13
4   61743        one     41
5   61743        two     42
6   61741  one & two     21
7   61741  one & two     22
8   61715        one     31
9   61715        two     32
10  61715      three     33

答案 1 :(得分:1)

替换此行:

df.ix[df.Number.isin(['one & two','one','two']), 'Number'] = 'one & two'

以下内容:

ids = df.ID[df.Number == 'one & two'].unique()
df.loc[df.ID.isin(ids) & df.Number.isin(['one', 'two']), 'Number'] = 'one & two'