Question

我最近开始学习Python并尝试掌握这些概念，同时获得了以下3万行的示例数据文件，并用空格分隔。

P160543 East Asia and Pacific   IN  C   
P166720 Africa  IN  N   
P165276 East Asia and Pacific   AD  n   IIST
P159835 Latin America and Caribbean LA  B   
P160778 Latin America and Caribbean LA  B   
P164290 South Asia  AS  N   
P165493 South Asia  SA  N   
P165585 Latin America and Caribbean LAC N   
P157987 South Asia  SA  C   ALAESH
P158364 South Asia  SAS B   EPATET

需要跳过第4列中包含“ N”或“ n”的行：
现在读取每一行并将列的值保存在变量中
如果Typest ='IN'，则指定搜索，然后将值返回为RegionName =“东亚和太平洋”和“非洲”，ID = P160543，P166720

如果第3列=“ AD”，则从第2列=“东亚和太平洋”返回值，id = P165276 如果第3列='LAC'，则返回值拉丁美洲和加勒比海

我没有Numpy和其他库可供使用...希望通过文件概念来完成此工作。

我知道要读取文件并显示文件的内容，删除空白行并跳过注释行，但是遇到了上述问题。

请咨询。

Answer 1

创建一个生成器以遍历文件的各行从文件的第一行抓取标题

def read_file(fullname):
    with open(fullname) as f:
        for line in f:
            yield header_line, line

myFile = read_file(r"Path/To/Your/File")
header_line = "id    RegionName    TypeSt    TypePD    TypeCode"

for line in myFile:
    data = dict(zip(header.split("\t"), line.split("\t")))

    # Here's a dictionary of the data for the current row
    # You can access the elements of the row by name as follows in the filter example:

    if data["TypePD"].lower() == "N":
        continue
    .....

这应该足以让您入门，因为这闻起来像是一项家庭作业。

警惕推荐熊猫的人-我在大数据环境中工作，熊猫无法使用多gig文件/数百万个记录生成器来工作。

Python：读取数据文件并处理/过滤数据

1 个答案: