如何计算具有指定值的文本文件中的行?

时间:2018-09-17 18:27:20

标签: python

我正在使用.csv文件,该文件在第一列中列出了“时间戳记”,在第二列中列出了“风速”。我需要通读此.csv文件并计算风速高于2m / s的时间百分比。到目前为止,这就是我所拥有的。

txtFile = r"C:\Data.csv"
line = o_txtFile.readline()[:-1]
while line:
    line = oTextfile.readline()
for line in txtFile:
    line = line.split(",")[:-1]

如何计算行中第二个元素大于2的行数?

CSV File Sample

2 个答案:

答案 0 :(得分:1)

根据选择的选项,您可能必须稍微更新CSV(对于选项1和选项2,您肯定要删除所有标题行,而对于选项3,您将只保留中间的行,即以TIMESTAMP开头的那个。)

您实际上有三个选择:

选项1:香草Python

count = 0

with open('data.csv', 'r') as file:
    for line in file:
        value = int(line.split(',')[1])
        if value > 100:
            count += 1

 # Now you have the value in ``count`` variable

选项2:CSV模块

我在这里使用Python's CSV module(您也可以使用DictReader,但我会让您自己进行搜索)。

import csv

count = 0

with open('data.csv', 'r') as file:
    reader = csv.read(file, delimiter=',')
    for row in reader:
        if int(row[1]) > 100:
            count += 1

 # Now you have the value in ``count`` variable

选项3:熊猫

Pandas是一个非常酷,功能强大的库,很多人使用它来进行数据分析。做你想做的事看起来像:

import pandas as pd

df = pd.read_csv('data.csv')

# Here you are
count = len(df[df['WindSpd_ms'] > 100])

答案 1 :(得分:0)

您可以逐行读取文件,如果其中有内容,请将其拆分。 您计算读取的线路以及10m / s以上的线路-然后计算百分比:

# create data file for processing with random data
import random
random.seed(42)

with open("data.txt","w") as f:
    f.write("header\n") 
    f.write("header\n") 
    f.write("header\n") 
    f.write("header\n") 
    for sp in random.choices(range(10),k=200):
        f.write(f"some date,{sp+3.5}, data,data,data\n")


# open/read/calculate percentage of data that has  10m/s speeds
days = 0
speedGreater10 = 0
with open("data.txt","r") as f:
    for _ in range(4):
        next(f) # ignore first 4 rows containing headers

    for line in f:
        if line: # not empty
            _ , speed, *p = line.split(",") 
            # _ and *p are ignored (they take 'some date' + [data,data,data])
            days += 1
            if float(speed) > 10:
                speedGreater10 += 1

print(f"{days} datapoints, of wich {speedGreater10} "+
      f"got more then 10m/s: {speedGreater10/days}%")

输出:

200 datapoints, of wich 55 got more then 10m/s: 0.275%

数据文件:

header
header
header
header
some date,9.5, data,data,data
some date,3.5, data,data,data
some date,5.5, data,data,data
some date,5.5, data,data,data
some date,10.5, data,data,data
[... some more ...]
some date,8.5, data,data,data 
some date,3.5, data,data,data
some date,12.5, data,data,data
some date,11.5, data,data,data