实施例

Question

非常感谢帮助！

我的CSV看起来像这样： CSV example

我正在编写一个程序来检查每列是否包含正确的数据类型。例如：

第1列 - 必须具有有效时间戳
第2列 - 必须保留值2
第4列 - 必须是连续的（如果没有丢失多少个数据包）
第5/6列 - 对值和结果进行的计算必须输入很多值

列可以位于不同的位置。

我尝试使用pandas模块使用pandas模块为每列提供'id'：

import pandas as pd
fields = ['star_name', 'ra']

df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)

print df.keys()

print df.star_name

然而，在对数据进行检查时，似乎感到困惑。做这样的事情的下一个最佳方法是什么？

我真的为此而自杀，任何帮助都会受到赞赏。

谢谢！

Answer 1

尝试使用＆＃39; csv＆＃39;模块。

实施例

import csv

with open('data.csv', 'r') as f:
    # The first line of the file is assumed to contain the column names
    reader = csv.DictReader(f)

    # Read one row at a time
    # If you need to compare with the previous row, just store that in a variable(s)
    prev_title_4_value = 0
    for row in reader:
        print(row['Title 1'], row['Title 3'])

        # Sample to illustrate how column 4 values can be compared
        curr_title_4_value = int(row['Title 4'])
        if (curr_title_4_value - prev_title_4_value) != 1:
            print 'Values are not consecutive'
        prev_title_4_value = curr_title_4_value

CSV操作|搜索列|检查规则

1 个答案:

实施例