大型数据集的优化

时间:2019-07-11 11:29:35

标签: python pandas performance numpy iteration

我在here中发布了代码以供审核。但是到目前为止,由于代码的冗长,它没有收到我认为正确的响应。在这里,我将其追逐。假设我们有以下列表:

t0=[('Albania','Angola','Germany','UK'),('UK','France','Italy'),('Austria','Bahamas','Brazil','Chile'),('Germany','UK'),('US')]
t1=[('Angola', 'UK'), ('Germany', 'UK'), ('UK', 'France'), ('UK', 'Italy'), ('France', 'Italy'), ('Austria', 'Bahamas')]
t2=[('Angola:UK'), ('Germany:UK'), ('UK:France'), ('UK:Italy'), ('France:Italy'), ('Austria:Bahamas')]

我们的目标是针对t1中的每一对,我们都要经过t0,如果找到该对,我们将其替换为相应的t3元素,我们可以使用以下方法进行操作:

result = []
for v1, v2 in zip(t1, t2):
    out = []
    for i in t0:
        common = set(v1).intersection(i)
        if set(v1) == common:
            out.append(tuple(list(set(i) - common) + [v2]))
        else:
            out.append(tuple(i))
    result.append(out)

pprint(result, width=100)  

给出:

[[('Albania', 'Germany', 'Angola:UK'),
  ('UK', 'France', 'Italy'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany', 'UK'),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany:UK'),
  ('UK', 'France', 'Italy'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany:UK',),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany', 'UK'),
  ('Italy', 'UK:France'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany', 'UK'),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany', 'UK'),
  ('France', 'UK:Italy'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany', 'UK'),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany', 'UK'),
  ('UK', 'France:Italy'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany', 'UK'),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany', 'UK'),
  ('UK', 'France', 'Italy'),
  ('Brazil', 'Chile', 'Austria:Bahamas'),
  ('Germany', 'UK'),
  ('U', 'S')]]

此列表的长度为6,这表明t1t2中有6个元素,每个子列表都有5个元素,这些元素与t0中的元素数量相对应。就目前而言,代码是快速的,但就我而言,我的t0的长度约为〜48000,t1的长度约为〜30000。运行时间几乎要花很多时间,我想知道如何用更快的方法执行相同的操作?

1 个答案:

答案 0 :(得分:1)

您可以使用双列表理解。该代码的运行速度大约为3.47倍(13.3 µs与46.2 µs)。

function encryptData($str,$ivv,$salt)
{
    $encrypt_method = "AES-256-CBC";
    $secret_key = $salt;
    $secret_iv = $ivv;
    $key = hash('sha256', $secret_key);
    $iv = substr(hash('sha256', $secret_iv), 0, 16);
    $output = openssl_encrypt($str, $encrypt_method, $key, 0, $iv);
    $output = base64_encode($output);
    return $output;
}

function decryptData($str,$ivv,$salt)
{       
    $encrypt_method = "AES-256-CBC";
    $secret_key = $salt;
    $secret_iv = $ivv;

    $key = hash('sha256', $secret_key);
    $iv = substr(hash('sha256', $secret_iv), 0, 16);
    $output = openssl_decrypt(base64_decode($str), $encrypt_method, $key, 0, $iv);
    return $output;
}