嵌套for循环的替代方法

时间:2020-09-03 07:40:10

标签: python pandas performance for-loop optimization

我有一个pandas df,其中某些列包含空白值。我有一个嵌套的for循环,通过从列表中提取这些值来为这些列填充值。给定列的所有行都获得相同的值,这是正确的。这里的顺序很重要,因为 col1 需要值 val1

import pandas as pd

df = pd.DataFrame({"col1": ["", "", ""],
                     "col2": ["", "", ""],
                     "col3": ["Facebook, Instagram", "Facebook, Facebook", "Twitter"]})

Columns = ['col1', 'col2'] #list of column names that the code should iterate over
Values = ['val1', 'val2'] #list of values to be inserted in the given columns

for n in Columns:
    for i in df:
        df[Columns] = Values

输出:

    col1    col2    col3
0   val1    val2    Facebook, Instagram
1   val1    val2    Facebook, Facebook
2   val1    val2    Twitter

我当前的代码有效,但是处理大量数据的速度非常慢。我该怎么做才能改善它?

2 个答案:

答案 0 :(得分:2)

我认为最简单的是传递变量,例如:

df[Columns] = Values
print (df)
   col1  col2                 col3
0  val1  val2  Facebook, Instagram
1  val1  val2   Facebook, Facebook
2  val1  val2              Twitter

性能(适用于10万行)

df = pd.DataFrame({"col1": ["", "", ""],
                     "col2": ["", "", ""],
                     "col3": ["Facebook, Instagram", "Facebook, Facebook", "Twitter"]})

Columns = ['col1', 'col2'] #list of column names that the code should iterate over
Values = ['val1', 'val2'] #list of values to be inserted in the given columns

df = pd.concat([df] * 100000, ignore_index=True)


%timeit df[Columns] = Values
7.53 ms ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

答案 1 :(得分:0)

两个循环(外部循环和内部循环)都是不必要的:import inspect def test_while(func): flag = False body = inspect.getsourcelines(func) string = ''.join(body[0]).replace(' ', '') splited = string.split('\n') for chain in splited: if len(chain) > 0 and chain[0] is not '#': if chain.startswith('while'): flag = True return flag n从未使用过,并且您正在执行相同的操作n * i次,因此代码很慢。只需摆脱循环,只需使用i