我有一个csv文件如下所示:
ID,Number,Value
61745,three,11
61745,one,11
61745,one & two,12
61745,two,13
61743,one,41
61743,two,42
61741,one,21
61741,one & two,22
61715,one,31
61715,two,32
61715,three,33
我想要实现的目标:
对于每个 ID ,如果 数字 列包含"一个&两个",我想要包含"两个"的所有 Number 列字段或者"一个"将被替换为" one& 2"值。例如,对于" 61745" ID我可以看到" one& 2"价值至少一次。但对于" 61743" ID我看不到这个值。所以,我想返回以下内容:
ID,Number,Value
61745,three,11
61745,one & two,11
61745,one & two,12
61745,one & two,13
61743,one,41
61743,two,42
61741,one & two,21
61741,one & two,22
61715,one,31
61715,two,32
61715,three,33
到目前为止,我已经尝试过这个:
import pandas as pd
import os
import csv
import time
import dateutil.parser as dparser
import datetime
df = pd.read_csv("slack.csv")
for row in df.itertuples():
if row[2] == "one & two":
df.ix[df.Number.isin(['one & two','one','two']), 'Number'] = 'one & two'
结果是脚本替换了所有" 2"和"一个"每个 ID 的 数字 列中的值:
ID Number Value
0 61745 three 11
1 61745 one & two 11
2 61745 one & two 12
3 61745 one & two 13
4 61743 one & two 41
5 61743 one & two 42
6 61741 one & two 21
7 61741 one & two 22
8 61715 one & two 31
9 61715 one & two 32
10 61715 pinterest 33
答案 0 :(得分:1)
使用groupby
自定义功能,检查至少有一个值是one & two
,然后是replace
dict
:
def f(x):
d = {'one':'one & two', 'two':'one & two'}
if x.eq('one & two').any():
return x.replace(d)
else:
return x
df['Number'] = df.groupby('ID')['Number'].apply(f)
print (df)
ID Number Value
0 61745 three 11
1 61745 one & two 11
2 61745 one & two 12
3 61745 one & two 13
4 61743 one 41
5 61743 two 42
6 61741 one & two 21
7 61741 one & two 22
8 61715 one 31
9 61715 two 32
10 61715 three 33
答案 1 :(得分:1)
替换此行:
df.ix[df.Number.isin(['one & two','one','two']), 'Number'] = 'one & two'
以下内容:
ids = df.ID[df.Number == 'one & two'].unique()
df.loc[df.ID.isin(ids) & df.Number.isin(['one', 'two']), 'Number'] = 'one & two'