Question

我需要从stdin读取csv文件，并仅将值等于列中指定的行的行输出。我的输入是这样的：

 2
 Kashiwa
 Name,Campus,LabName
 Shinichi MORISHITA,Kashiwa,Laboratory of Omics
 Kenta Naai,Shirogane,Laboratory of Functional Analysis in Silico
 Kiyoshi ASAI,Kashiwa,Laboratory of Genome Informatics
 Yukihide Tomari,Yayoi,Laboratory of RNA Function

我的输出应该是这样的：

 Name,Campus,LabName
 Shinichi MORISHITA,Kashiwa,Laboratory of Omics
 Kiyoshi ASAI,Kashiwa,Laboratory of Genome Informatics

我需要对第2列==柏中值的人进行分类，而不要在stdout中输出前两行stdin。

到目前为止，我只是尝试从stdin读入csv，但是我将每一行作为字符串列表获取（如csv文档所期望的那样）。我可以更改吗？

 #!usr/bin/env python3

 import sys
 import csv

 data = sys.stdin.readlines()

 for line in csv.reader(data):

      print(line)

输出：

 ['2']
 ['Kashiwa']
 ['Name', 'Campus', 'LabName']
 ['Shinichi MORISHITA', 'Kashiwa', 'Laboratory of Omics']
 ['Kenta Naai', 'Shirogane', 'Laboratory of Functional Analysis in 
 Silico']
 ['Kiyoshi ASAI', 'Kashiwa', 'Laboratory of Genome Informatics']
 ['Yukihide Tomari', 'Yayoi', 'Laboratory of RNA Function']

有人可以给我一些建议，以便将stdin读入CSV并稍后处理数据（仅输出所需的列值，交换列等）吗？

Answer 1

这是一种方法。

例如：

import csv

with open(filename) as csv_file:
    reader = csv.reader(csv_file)
    next(reader) #Skip First Line
    next(reader) #Skip Second Line
    print(next(reader)) #print Header
    for row in reader:
        if row[1] == 'Kashiwa':   #Filter By 'Kashiwa'
            print(row)

输出：

['Name', 'Campus', 'LabName']
['Shinichi MORISHITA', 'Kashiwa', 'Laboratory of Omics']
['Kiyoshi ASAI', 'Kashiwa', 'Laboratory of Genome Informatics']

Answer 2

使用熊猫在DataFrame中读取和管理数据

import pandas as pd
# File location
infile = r'path/file'
# Load file and skip first two rows
df = pd.read_csv(infile, skiprows=2)
# Refresh your Dataframe en throw out the rows that contain Kashiwa in the campus column
df = df[df['campus'] != 'Kashiwa']

您可以执行各种编辑，例如，只需通过以下方式对DataFrame进行排序：

df.sort(columns='your column')

检查Pandas documentation的所有可能性。

Answer 3

 #!usr/bin/env python3
 import sys
 import csv

 data = sys.stdin.readlines()  # to read the file
 column_to_be_matched = int(data.pop(0)) # to get the column number to match
 word_to_be_matched = data.pop(0) # to get the word to be matched in said column
 col_headers = data.pop(0) # to get the column names
 print(", ".join(col_headers)) # to print the column names
 for line in csv.reader(data):
     if line[column_to_be_matched-1] == word_to_be_matched: #while it matched
        print(", ".join(line)) #print it

在Python中从stdin读取CSV文件并对其进行修改

3 个答案: