Question

我有600个csv文件，每个文件包含大约1500行数据。我必须在每一行数据上运行一个函数。我已经定义了功能。

@Override
public View getView(int position, View convertView, ViewGroup parent) {
    return getViewForResource(mResourceId, position, convertView, parent);
}

@Override
public View getDropDownView(int position, View convertView, ViewGroup parent) {
    return getViewForResource(mDropDownResourceId, position, convertView, parent);
}

public View getViewForResource(int resourceId, int position, View convertView, ViewGroup parent) {
    final ViewDataBinding binding = (convertView != null) ? DataBindingUtil.getBinding(convertView) : DataBindingUtil.inflate(mLayoutInflater, resourceId, parent, false);
    binding.setVariable(BR.viewModel, getItem(position));
    return binding.getRoot();
}

上述功能是根据条件执行诸如def query_prepare(data): """function goes here""" """here input data is list of single row of dataframe"""，strip()之类的功能。上面的功能将每行数据作为列表。

replace()。

这是我最初的数据框外观

data = ['apple$*7','orange  ','bananna','-']

我使用该函数检查了一行数据处理，大约需要a b c d 0 apple$*7 orange bananna - 1 apple()*7 flower] *bananna -。如果我在一个包含1500行数据的csv文件中运行此文件，则几乎需要0.04s。我尝试了一些方法。...

1500*0.04s

我使用# normal in built apply function t = time.time() a = df.apply(lambda x: query_prepare(x.to_list()),axis=1) print('time taken',time.time()-t) # time taken 52.519816637039185 # with swifter t = time.time() a = df.swifter.allow_dask_on_strings().apply(lambda x: query_prepare(x.to_list()),axis=1) print('time taken',time.time()-t) # time taken 160.31028127670288 # with pandarallel pandarallel.initialize() t = time.time() a = df.parallel_apply(lambda x: query_prepare(x.to_list()),axis=1) print('time taken',time.time()-t) # time taken 55.000578函数做了所有事情，以减少时间，因此无法更改或修改它。还有其他建议吗？

我在Google colab上运行PS的方式

编辑：如果我们有1500行数据，请将其拆分为15，然后应用该函数。如果执行这样的操作，可以将时间减少15倍吗？（很抱歉，我不确定它是否可行或不能很好地指导我）

Answer 1

例如，您可以大致执行以下操作：

def sanitize_column(s: pd.Series):
    return s.str.strip().str.strip('1234567890()*[]')

那么你可以做：

df.apply(sanitize_column, axis=0)

具有：

df = pd.DataFrame({'a': ['apple7', 'apple()*7'], 'b': ["    asd   ", ']asds89']})

这会给

       a     b
0  apple   asd
1  apple  asds

这应该比您的解决方案更快。为了进行适当的基准测试，我们需要您完整的解决方案。

如何使熊猫行处理更快？

1 个答案: