Question

我有一个列表列表，其中包含有关某个群体人口的县级信息。每个组都有自己的行。

[['Autauga County, Alabama',
  'American Indian or Alaska Native Alone',
  '05000US01001',
  '3',
  '170',
  '61',
  '85',
  '45',
  '170',
  '61',
  '85',
  '45'],
 ['Autauga County, Alabama',
  'Asian Alone',
  '05000US01001',
  '4',
  '515',
  '162',
  '415',
  '98',
  '315',
  '154',
  '220',
  '120'],
 ['Autauga County, Alabama',
  'Black or African American Alone',
  '05000US01001',
  '5',
  '10420',
  '162',
  '7620',
  '71',
  '10420',
  '162',
  '7620',
  '71'],
 ['Autauga County, Alabama',
  'Native Hawaiian or Other Pacific Islander Alone',
  '05000US01001',
  '6',
  '10',
  '20',
  '10',
  '20',
  '10',
  '20',
  '10',
  '20'],
 ['Autauga County, Alabama',
  'White Alone',
  '05000US01001',
  '7',
  '41550',
  '27',
  '31920',
  '27',
  '41400',
  '137',
  '31800',
  '90']]

或另一种可视化数据的方式：

县| LNTITLE |人口|

洛杉矶|白| 6423432 |

洛杉矶|黑色| 4323333 |

洛杉矶|西班牙裔| 32432444 |

阿拉米达|白| 24343243 |

阿拉米达|黑色| 43234323243 |

我正在尝试为更精细的级别计算种族分数化分数，并且我编写的代码可以正常工作，但对于少于3200个观察值却相当慢。我要计算种族分数分数的其他数据集有超过100000个观察值，我担心我的计算机无法处理这样的负载。

问题：您会建议我做些什么，以使我的代码在处理更大的数据集时更有效？

清洁和准备工作：我首先决定积累唯一的县名列表。然后，我开始编写种族分数分数函数。

我创建了一个数组来存储不属于波多黎各行（县）的县，并且我想忽略包含总计，不属于西班牙裔或拉丁裔的行，因为它们不是种族群体，并且会导致估算结果非常错误。

实际方法（最后一个代码块）：我创建了一个名为RFS_Store的数组，我的目标是存储县名和RFS。我通过遍历每个唯一的县来实现这一点，在这里我创建了另一个数组来存储与县名匹配的行的第4个索引元素。数组装满后，我将继续存储输入的数组的返回RFS值及其对应的县名。

unique = []
for row in county:
    name = row[0]
    if name not in unique and "Puerto Rico" not in name:
        unique.append(name)

def RFS(county_values):
    total = sum(county_values)
    squares = []
    for item in county_values:
        squares.append((int(item)/total)**2)
    squared = sum(squares)
    return 1 - squared

new_county = []
for row in county:
    name = row[0]
    lntitle = row[1]
    if "Puerto Rico" not in name and lntitle != 'Total' and lntitle != "Not Hispanic or Latino":
        new_county.append(row)

RFS_Store = []
for elem in unique:
    stored = []
    for row in new_county:
        if elem == row[0]:
            stored.append(int(row[4]))
    RFS_Store.append([RFS(stored),elem])

没有错误，只是此过程有点慢，而且我运行的越多，我就越注意到计算机运行异常。

有没有使用python中的嵌套for循环来计算数据的替代方法？

0 个答案: