遍历相应的行并更改数据框的值

时间:2019-02-21 14:56:34

标签: python pandas csv

我有一个包含大量数据的csv文件。我已经将csv作为python中的数据框。我想将每一行与其对应的行进行比较,如果第一行的值为1,第二行的值为100,则程序应将100替换为50。如果有2行包含100之上的1,则值100应该更改为25,如果100上方有3行包含1,则100的值应为12.5,依此类推。这是csv文件的数据帧:

  rule_id           51594   51668   51147   51182   51447
0   comparison1     1.0      1.0     NaN    NaN      NaN
1   last_comp      100.0    100.0    NaN    NaN      NaN
2   comparison1     NaN      NaN     1.0    NaN      1.0
3   comparison2    100.0     NaN     1.0    NaN      1.0
4   comparison3     NaN      NaN     1.0   100.0     100.0
5   comparison4     NaN      NaN    100.0   NaN      NaN

结果应如下所示:

     rule_id        51594   51668   51147   51182   51447
0   comparison1     1.0      1.0     NaN    NaN      NaN
1   last_comp       50.0     50.0    NaN    NaN      NaN
2   comparison1     NaN      NaN     1.0    NaN      1.0
3   comparison2     100      NaN     1.0    NaN      1.0
4   comparison3     NaN      NaN     1.0    100      25.0
5   comparison4     NaN      NaN     12.5   NaN      NaN

这是代码:

for key in df:
    for i, value in enumerate(df[key]):
        n = 1
        t = 100
        if value == t and i > 0 and df[key][i-n] == 1.0:
            df[key][i] = value/2  
            n = n+1
            t = t/2
    break 

基本上,我在这里声明了2个变量。n的值为1,t的值为100,然后在if循环中使用它们。

我得到的结果是:

    rule_id        51594    51668   51147   51182   51447
0   comparison1     1.0      1.0     NaN    NaN      NaN
1   last_comp       50.0     50.0    NaN    NaN      NaN
2   comparison1     NaN      NaN     1.0    NaN      1.0
3   comparison2    100.0     NaN     1.0    NaN      1.0
4   comparison3     NaN      NaN     1.0   100.0     50.0
5   comparison4     NaN      NaN     50.0   NaN      NaN

我不知道问题是什么。如果您可以帮助我解决问题,那就太好了。

3 个答案:

答案 0 :(得分:3)

我想分别对每一列进行此操作。每次遇到100时,都需要为每一列组成组。

import pandas as pd

for col in df.columns[1:]:
    df[col] = (df[col].groupby(df[col].eq(100).shift(1).fillna(0).cumsum())
                      .apply(lambda x: x.mask(x == 100, 100/(2**x.eq(1).sum()))))

输出:

       rule_id  51594  51668  51147  51182  51447
0  comparison1    1.0    1.0    NaN    NaN    NaN
1    last_comp   50.0   50.0    NaN    NaN    NaN
2  comparison1    NaN    NaN    1.0    NaN    1.0
3  comparison2  100.0    NaN    1.0    NaN    1.0
4  comparison3    NaN    NaN    1.0  100.0   25.0
5  comparison4    NaN    NaN   12.5    NaN    NaN

答案 1 :(得分:2)

一个惊人的问题,花了我一段时间解决,但我认为以下是您所追求的

def init(df):
    for title in list(df):
        column = df[title]
        the_last_value_was_a_one = False
        number_of_consecutive_ones = 1
        for i, value in enumerate(column):
            if value == 1:
                the_last_value_was_a_one = True
                number_of_consecutive_ones *= 2
            elif (value == 100) and (the_last_value_was_a_one == True):
                df.at[i, title] = 100/(number_of_consecutive_ones)
                the_last_value_was_a_one = False
                number_of_consecutive_ones = 1
            else:
                the_last_value_was_a_one = False
                number_of_consecutive_ones = 1
    return df


df = init(df)

哪个返回:

     rule_id    51594   51668   51147   51182   51447
0   comparison1 1.0     1.0     NaN      NaN    NaN
1   last_comp   50.0    50.0    NaN      NaN    NaN
2   comparison1 NaN     NaN     1.0      NaN    1.0
3   comparison2 100.0   NaN     1.0      NaN    1.0
4   comparison3 NaN     NaN     1.0     100.0   25.0
5   comparison3 NaN     NaN     12.5    NaN     NaN

答案 2 :(得分:0)

import pandas as pd


df = pd.DataFrame(data={"col1": [1,1,100,1,1,100], 'col2': [1,1,100,1,1,100]})

# get list of columns (will be used later)
cols = df.columns

# create list of next division by 2 (will be used later)
original = 100
ll = []
for x in range(1, 20):
    ll.append(original)
    original /= 2

ll = list(zip([x for x in range(1, 20)], ll))

# create dictionary of indexes and divisions
dd = {x:y for x,y in ll}


for c in df.columns:
    df[f'{c}_next'] = df[c].shift(-1)

# main function get 1&100 pairs and replacing values
def compare_vals(row, cols):
    counter = 1
    for c in cols:
        if row[f'{c}_next'] == 100 and row[c] == 1:
            counter += 1

    for c in cols:
        if row[f'{c}_next'] == 100 and row[c] == 1:
            row[f'{c}_next'] = dd[counter]
    return row

df_new = df.apply(lambda row: compare_vals(row, cols), axis=1)

df_new = df_new[[x for x in df_new.columns if x not in cols]]
cols_new = {x: x.replace('_next', '') for x in df_new.columns}
df_new = df_new.rename(columns=cols_new)
df_new = df_new.shift(1)
df_new.iloc[0, :] = df.iloc[0,:]

输出

   col1  col2
0   1.0   1.0
1   1.0   1.0
2  25.0  25.0
3   1.0   1.0
4   1.0   1.0
5  25.0  25.0