如何基于熊猫中特定行的值创建新列

时间:2019-05-07 11:18:33

标签: python pandas

我正在尝试通过将给定列表乘以列中的特定行来创建新列。

Here is my df;

d = {'ID':['ZZ7','ZZ7','ZZ7','ZZ7','ZZ7','ZZ7','ZZ7','RR6','RR6','RR6','RR6','RR6','RR6','RR6',
     'DD5','DD5','DD5','DD5','DD5','DD5','DD5'],'Section': ['1H','1H','2H','2H','2H','3R','3R','1H',
     '1H','1H','2H','2H','3R','3R','1H','1H','2H','2H','3R','3R','3R'],
'A': [1,2,5,1,1,2,1,1,2,3,1,1,3,1,1,2,2,3,1,2,1],
     'B': [2,3,1,1,3,1,1,3,1,1,2,2,3,1,2,1,2,1,1,2,1]}
df = pd.DataFrame(d)

Here are the lists to be used to create new cols.

RateB_1H = [1,2,3,4]
RateB_2H = [3,4,5,6]
RateB_3R = [1,3,5,7]

RateA_1H = [1,1,2,1]
RateA_2H = [2,3,1,2]
RateA_3R = [1,3,2,1]

通过选择与特定版块关联的值, 通过选择与i.e. df['Rate_A']关联的相应值来创建df['Section']

df[df.Section=='1H'] from RateA_1H, 
df[df.Section=='2H'] from RateA_2H,
df[df.Section=='3R'] from RateA_3R,

df['Rate_B']类似。

df[df.Section=='1H'] from RateB_1H, 
df[df.Section=='2H'] from RateB_2H,
df[df.Section=='3R'] from RateB_3R,

(通过蛮力)如下所示。

    ID  Section A   B   Rate_B  Rate_A
0   ZZ7   1H    1   2      1    1
1   ZZ7   1H    2   3      2    1
2   ZZ7   2H    5   1      3    2
3   ZZ7   2H    1   1      4    3
4   ZZ7   2H    1   3      5    1
5   ZZ7   3R    2   1      1    1
6   ZZ7   3R    1   1      3    3
7   RR6   1H    1   3      1    1
8   RR6   1H    2   1      2    1
9   RR6   1H    3   1      3    2
10  RR6   2H    1   2      3    2
11  RR6   2H    1   2      4    3
12  RR6   3R    3   3      1    1
13  RR6   3R    1   1      3    3
14  DD5   1H    1   2      1    1
15  DD5   1H    2   1      2    1
16  DD5   2H    2   2      3    2
17  DD5   2H    3   1      4    3
18  DD5   3R    1   1      1    1
19  DD5   3R    2   2      3    3
20  DD5   3R    1   1      5    2

对于为大型数据框创建上述列的任何帮助,我们将不胜感激。

2 个答案:

答案 0 :(得分:0)

尝试使用下面的代码,它首先复制列,然后相应地使用replace:

df['Rate_A'] = df['A']
df['Rate_B'] = df['B']

df['Rate_B'] = df['Rate_B'].str.replace({"1H":1, "2H": 2, "3R": 3})
df['Rate_A'] = df['Rate_A'].str.replace({"1H":4, "2H": 5, "3R": 6})

print(df)

答案 1 :(得分:0)

我认为您可以将数据帧分为三个部分,并分别对每个部分进行操作。

我假设列表RateA_xxx足够长。

AvgA_1H = [1,1,2,1,0,0,0]
AvgA_2H = [2,3,1,2,0,0,0]
AvgA_3R = [1,3,2,1,0,0,0]

oneh = df[df['Section']=='1H']
twoh = df[df['Section']=='2H']
threer = df[df['Section']=='3R']

oneh['Rate_A'] = AvgA_1H
twoh['Rate_A'] = AvgA_2H
threer['Rate_A'] = AvgA_3R

pd.concat([oneh,twoh,threer])
相关问题