我有一个包含大量数据的csv文件。我已经将csv作为python中的数据框。我想将每一行与其对应的行进行比较,如果第一行的值为1,第二行的值为100,则程序应将100替换为50。如果有2行包含100之上的1,则值100应该更改为25,如果100上方有3行包含1,则100的值应为12.5,依此类推。这是csv文件的数据帧:
rule_id 51594 51668 51147 51182 51447
0 comparison1 1.0 1.0 NaN NaN NaN
1 last_comp 100.0 100.0 NaN NaN NaN
2 comparison1 NaN NaN 1.0 NaN 1.0
3 comparison2 100.0 NaN 1.0 NaN 1.0
4 comparison3 NaN NaN 1.0 100.0 100.0
5 comparison4 NaN NaN 100.0 NaN NaN
结果应如下所示:
rule_id 51594 51668 51147 51182 51447
0 comparison1 1.0 1.0 NaN NaN NaN
1 last_comp 50.0 50.0 NaN NaN NaN
2 comparison1 NaN NaN 1.0 NaN 1.0
3 comparison2 100 NaN 1.0 NaN 1.0
4 comparison3 NaN NaN 1.0 100 25.0
5 comparison4 NaN NaN 12.5 NaN NaN
这是代码:
for key in df:
for i, value in enumerate(df[key]):
n = 1
t = 100
if value == t and i > 0 and df[key][i-n] == 1.0:
df[key][i] = value/2
n = n+1
t = t/2
break
基本上,我在这里声明了2个变量。n的值为1,t的值为100,然后在if循环中使用它们。
我得到的结果是:
rule_id 51594 51668 51147 51182 51447
0 comparison1 1.0 1.0 NaN NaN NaN
1 last_comp 50.0 50.0 NaN NaN NaN
2 comparison1 NaN NaN 1.0 NaN 1.0
3 comparison2 100.0 NaN 1.0 NaN 1.0
4 comparison3 NaN NaN 1.0 100.0 50.0
5 comparison4 NaN NaN 50.0 NaN NaN
我不知道问题是什么。如果您可以帮助我解决问题,那就太好了。
答案 0 :(得分:3)
我想分别对每一列进行此操作。每次遇到100时,都需要为每一列组成组。
import pandas as pd
for col in df.columns[1:]:
df[col] = (df[col].groupby(df[col].eq(100).shift(1).fillna(0).cumsum())
.apply(lambda x: x.mask(x == 100, 100/(2**x.eq(1).sum()))))
rule_id 51594 51668 51147 51182 51447
0 comparison1 1.0 1.0 NaN NaN NaN
1 last_comp 50.0 50.0 NaN NaN NaN
2 comparison1 NaN NaN 1.0 NaN 1.0
3 comparison2 100.0 NaN 1.0 NaN 1.0
4 comparison3 NaN NaN 1.0 100.0 25.0
5 comparison4 NaN NaN 12.5 NaN NaN
答案 1 :(得分:2)
一个惊人的问题,花了我一段时间解决,但我认为以下是您所追求的
def init(df):
for title in list(df):
column = df[title]
the_last_value_was_a_one = False
number_of_consecutive_ones = 1
for i, value in enumerate(column):
if value == 1:
the_last_value_was_a_one = True
number_of_consecutive_ones *= 2
elif (value == 100) and (the_last_value_was_a_one == True):
df.at[i, title] = 100/(number_of_consecutive_ones)
the_last_value_was_a_one = False
number_of_consecutive_ones = 1
else:
the_last_value_was_a_one = False
number_of_consecutive_ones = 1
return df
df = init(df)
哪个返回:
rule_id 51594 51668 51147 51182 51447
0 comparison1 1.0 1.0 NaN NaN NaN
1 last_comp 50.0 50.0 NaN NaN NaN
2 comparison1 NaN NaN 1.0 NaN 1.0
3 comparison2 100.0 NaN 1.0 NaN 1.0
4 comparison3 NaN NaN 1.0 100.0 25.0
5 comparison3 NaN NaN 12.5 NaN NaN
答案 2 :(得分:0)
import pandas as pd
df = pd.DataFrame(data={"col1": [1,1,100,1,1,100], 'col2': [1,1,100,1,1,100]})
# get list of columns (will be used later)
cols = df.columns
# create list of next division by 2 (will be used later)
original = 100
ll = []
for x in range(1, 20):
ll.append(original)
original /= 2
ll = list(zip([x for x in range(1, 20)], ll))
# create dictionary of indexes and divisions
dd = {x:y for x,y in ll}
for c in df.columns:
df[f'{c}_next'] = df[c].shift(-1)
# main function get 1&100 pairs and replacing values
def compare_vals(row, cols):
counter = 1
for c in cols:
if row[f'{c}_next'] == 100 and row[c] == 1:
counter += 1
for c in cols:
if row[f'{c}_next'] == 100 and row[c] == 1:
row[f'{c}_next'] = dd[counter]
return row
df_new = df.apply(lambda row: compare_vals(row, cols), axis=1)
df_new = df_new[[x for x in df_new.columns if x not in cols]]
cols_new = {x: x.replace('_next', '') for x in df_new.columns}
df_new = df_new.rename(columns=cols_new)
df_new = df_new.shift(1)
df_new.iloc[0, :] = df.iloc[0,:]
输出
col1 col2
0 1.0 1.0
1 1.0 1.0
2 25.0 25.0
3 1.0 1.0
4 1.0 1.0
5 25.0 25.0