Question

在Python中，我有一个文件，单词之间用|隔开，例如：city|state|zipcode。我的文件阅读器无法分隔单词。另外，我希望文件阅读器从第2行而不是第1行开始。如何让文件阅读器将单词分开？

import os
import sys

def file_reader(path, num_fields, seperator = ',', header = False):
    try:
        fp = open(path, "r", encoding="utf-8")
    except FileNotFoundError:
        raise FileNotFoundError("Unable to open file.")
    else:
        with fp:
            for n, line in enumerate(fp, 1):
                fields = line.rstrip('/n').split(seperator)
                if len(fields) != num_fields:
                    raise ValueError("Unable to read file.")
                elif n == 1 and header:
                    continue
                else:
                    yield tuple([f.strip() for f in fields])

Answer 1

如果您使用[1:-1]（我认为），则可以选择一个子数组，该子数组在该数组的第一个值之后开始，对于文件来说，这意味着您将获得除第一行以外的所有行。 / p>

Answer 2

如果您不介意使用现有框架，则可以使用pandas。您可以使用skiprows = 1跳过第一行，并使用sep ='|'

更改分隔符

# load pandas
import pandas as pd

# read file as pandas dataframe
dataframe = pd.read_csv(file,skiprows=1,sep='|')
print(dataframe)

要安装熊猫

pip install pandas

read_csv的熊猫文档

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

其他选项是使用csv阅读器读取您的psv文件

import csv

with open('file.psv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='|')
    next(csv_reader, None)  # read once to skip the header once

    for row in csv_reader:
            print(row)

Answer 3

如果您需要从第二行开始阅读，则可以将代码从for n, line in enumerate(fp, 1)更改为for n, line in enumerate(fp[1:], 1)

Answer 4

如果您想让一个超级伪劣的++选项跳过枚举第一个值：将一个布尔值初始化为true，然后在for循环的开始处添加一个if语句，以测试此布尔值是否为true。在此if语句中，将值设置为false，然后传递continue

类似的东西：

b = True
for k, v in enumerator:
  if b:
    b = False
    continue
  # Some code

Answer 5

为了实现您的要求，该函数很好，并且重要的是使用正确的参数进行调用，并使它们与默认参数不同。

在代码中，默认行为是使用,作为分隔符，并且不跳过文件的第一行。为了实际用|进行拆分并跳过第一行（即标题），我们将在调用时设置seperator='|'和header = True。

# Function is fine, leave as-is
#
def file_reader(path, num_fields, seperator = ',', header = False):
    try:
        fp = open(path, "r", encoding="utf-8")
    except FileNotFoundError:
        raise FileNotFoundError("Unable to open file.")
    else:
        with fp:
            for n, line in enumerate(fp, 1):
                fields = line.rstrip('/n').split(seperator)
                if len(fields) != num_fields:
                    raise ValueError("Unable to read file.")
                elif n == 1 and header:
                    continue
                else:
                    yield tuple([f.strip() for f in fields])

# Example file afile.txt contains these lines:
# alfa|beta|gamma|delta
# 1|2|3|4
# a|b|c|d

# here we call the function:

filename = 'afile.txt'
for x in file_reader(filename, 4, '|', True):  #note the separator and header
    print(x)

Answer 6

我们将读取文件分为3个步骤，将文件的每一行存储在一个列表中，然后将列表分开

正在读取文件 在python中，您可以使用“ open”命令轻松读取文件，如下所示：

fp=open("file.txt",'r')

分别阅读每一行 读取文件为行，可以使用“ readlines”命令，如下所示：

lines=fp.readline():

这将以列表形式返回文件的内容，其中每个记录代表一行。您还可以通过传递行号fp.readline(5)

来读取特定行

->有关更多信息，请检查reading files in python 分隔内容 用“ |”分隔字符串使用“拆分”方法：

for item in lines:
    res=item.split('|')
    #do what you want with res

单词用“ |”分隔时如何读取文件（PSV）？

6 个答案: