Question

我正在使用大型私有数据集，我需要一些帮助来确定如何在多行上迭代程序。

import csv

with open('report_export.csv') as f:

    reader = csv.DictReader(f)
    report_export = list(reader)

x = report_export[25]["Text_Content"]
x.split(sep='. ')[1]

 ' Azithromycin is an antibiotic agent and a member of a subclass of 
   macrolide antibiotics with bactericidal and bacteriostatic activities.

report_export.csv是从本地数据库中提取的数据文件，其中包含来自出版物的信息化学品。我想从这个文件中获得一些文字信息。它位于＆＃34;文本内容＆＃34;列下。＆＃34; 25＆＃34;是一个随机行，作为代码的原理证明。使用x.split（sep =＆＃39;。＆＃39;）[1] 用于隔离所需的字符串并区分小数和句点。这个文件相当大，CSV上有5000行，我希望能够提取与上面输出类似的化学物质的陈述。

在这种情况下，我很难弄清楚如何迭代多行。我需要程序来读取行，从列表中检索第二个字符串（1个位置），并能够将此数据保存到新的csv文件中。

有助于实现迭代行的任何帮助都会有所帮助。

谢谢！

最佳，

PEB

Answer 1

查看pandas pandas.read_csv(filename)方法

import pandas as pd
dataframe = pd.read_csv(filename)

为了迭代行，请使用方法iterrows（）

for index, row in df.iterrows():
    print(row)

Answer 2

对list(reader)的调用会将整个文件读入列表。要迭代csv文件中的行而不一次读取所有内容，请用循环替换该行：

for row in reader:
    x = row["Text_Content"]
    etc.

那就是说，五千行不是那么多，所以你也可以迭代你创建的列表report_export：

for row in report_export:
    x = row["Text_Content"]
    etc.

Answer 3

你快到了。如果要做的只是提取并写入文件，只需使用writer中的某种csv对象，并使用读写器组合，同时直接迭代读者对象：

import csv
with open('report_export.csv', newline='') as infile,\ 
            open('report_out.csv', 'w', newline='') as outfile:

    reader = csv.DictReader(infile)
    fieldnames = ["Text_content"]
    writer = csv.DicWriter(outfile, fieldnames=fieldnames)

    for row in reader: # iterates over the csv row-by-row
        data = row["Text_Content"].split(sep='. ')[1]
        writer.writerow({"Text_content": data})

以上假设是Python 3。

在Python中迭代多行 - Science Data Cleaning

3 个答案: