我有一个pandas df,其中某些列包含空白值。我有一个嵌套的for循环,通过从列表中提取这些值来为这些列填充值。给定列的所有行都获得相同的值,这是正确的。这里的顺序很重要,因为 col1 需要值 val1 。
import pandas as pd
df = pd.DataFrame({"col1": ["", "", ""],
"col2": ["", "", ""],
"col3": ["Facebook, Instagram", "Facebook, Facebook", "Twitter"]})
Columns = ['col1', 'col2'] #list of column names that the code should iterate over
Values = ['val1', 'val2'] #list of values to be inserted in the given columns
for n in Columns:
for i in df:
df[Columns] = Values
输出:
col1 col2 col3
0 val1 val2 Facebook, Instagram
1 val1 val2 Facebook, Facebook
2 val1 val2 Twitter
我当前的代码有效,但是处理大量数据的速度非常慢。我该怎么做才能改善它?
答案 0 :(得分:2)
我认为最简单的是传递变量,例如:
df[Columns] = Values
print (df)
col1 col2 col3
0 val1 val2 Facebook, Instagram
1 val1 val2 Facebook, Facebook
2 val1 val2 Twitter
性能(适用于10万行)
df = pd.DataFrame({"col1": ["", "", ""],
"col2": ["", "", ""],
"col3": ["Facebook, Instagram", "Facebook, Facebook", "Twitter"]})
Columns = ['col1', 'col2'] #list of column names that the code should iterate over
Values = ['val1', 'val2'] #list of values to be inserted in the given columns
df = pd.concat([df] * 100000, ignore_index=True)
%timeit df[Columns] = Values
7.53 ms ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
答案 1 :(得分:0)
两个循环(外部循环和内部循环)都是不必要的:import inspect
def test_while(func):
flag = False
body = inspect.getsourcelines(func)
string = ''.join(body[0]).replace(' ', '')
splited = string.split('\n')
for chain in splited:
if len(chain) > 0 and chain[0] is not '#':
if chain.startswith('while'):
flag = True
return flag
和n
从未使用过,并且您正在执行相同的操作n * i次,因此代码很慢。只需摆脱循环,只需使用i
。