Question

我有关于样本员工数据的文件。第一行是姓名，第二行是工资，第三行是人寿保险选举（Y / N），第四行是健康保险选举（PPOI，PPOF，无），并且重复。该文件的片段如下：

Joffrey Baratheon
190922
Y
PPOI
Gregor Clegane
47226
Y
PPOI
Khal Drogo
133594
N
PPOI
Hodor
162581
Y
PPOF
Cersei Lannister
163985
N
PPOI
Tyrion Lannister
109253
N
PPOF
Jorah Mormont
61078
Y
None
Jon Snow
123222
N
None

如何获取此文件数据并将每种数据类型（名称，工资，人寿保险，健康保险）提取到四个单独的列表中？目前，我的代码是由员工创建一个多维列表，但我最终想要四个单独的列表。我目前的代码如下：

def fileread(text):
    in_file = open(text, "r")
    permlist = []
    x = 1
    templist = []
    for line in in_file:
        line = line.strip()
        templist.append(line)
        if x == 4:
            permlist.append(templist)
            templist = []
            x = 1
        else:
            x+=1
    return (permlist)
def main ():
    EmpData = fileread("EmployeeData.txt")
    index = 0
    print (EmpData[index])

Answer 1

你可以使用4个列表推导：

with open("file.txt",'r') as f:
    lines = f.readlines()

name_list = [lines[i].rstrip() for i in range(0,len(lines),4)]
salary_list = [lines[i].rstrip() for i in range(1,len(lines),4)]
life_ins_list = [lines[i].rstrip() for i in range(2,len(lines),4)]
health_ins_list = [lines[i].rstrip() for i in range(3,len(lines),4)]

Answer 2

计算总行数，除以4得到要添加到列表中的人数。

library(raster)
library(ggplot2)
foo <- structure(list(category = c("a", "b", "b", "c", "a", "b"),
                 longitude = c(-83.7, -83.7, -84.6, -82.4, -80.1, -75.9),
                 latitude = c(26.2, 26.2, 26.2, 25.7, 27.7, 34.5)),
                 .Names = c("category", "longitude", "latitude"),
                 class = "data.frame",
                 row.names = c(NA, -6L))
split_foo <- split(foo, foo$category)
us_raster <- raster(xmn = -127, ymn = 23, xmx = -61, ymx = 50, res = 1)
raster_lst <- lapply(split_foo, function(x) {
   pts <- SpatialPoints(data.frame(lon = foo$longitude, lat = foo$latitude))
   rasterize(pts, us_raster, fun="count")
})
raster_foo <- Reduce("merge", raster_lst)
gg_foo <- as.data.frame(as(raster_foo, "SpatialPixelsDataFrame"))
colnames(gg_foo) <- c("value", "x", "y")
ggplot() +
  geom_raster(data = gg_foo, aes(x = x, y = y, fill = value)) +
  coord_quickmap()

小心分割这样的数据。很容易混淆。最好将这些数据存储到数据库中。

Answer 3

您可以使用import pandas as pd import scipy df = pd.DataFrame({'numbers':range(9), 'group':['a', 'b', 'c']*3}) groups = {} for grp in df['group'].unique(): groups[grp] = df['numbers'][df['group']==grp].values print(groups) args = groups.values() scipy.stats.kruskal(*args)库中的islice。它允许您一次迭代4行的批次。

itertools

正如有人在这篇文章中抱怨学术上的不诚实违规行为，我想说明这是一个生产代码片段的简化版本，受到这个SO答案的启发：How to read file N lines at a time in Python?。

将文件数据转换为列表

3 个答案: