Question

我有一个文本文件，其中按日期包含汽油价格信息。该文件的格式为：

年月日：价格

示例：GasPrices

我需要完成2个任务：

（1）将输入分隔为-月，日，年，价格

（2）计算每年和每月的平均汽油价格。

由于我不熟悉Stack Overflow和编码，有人可以指出我正确的方向吗？

Answer 1

这个问题非常简单，甚至不保证使用正则表达式。
python的美丽之处在于您始终可以节省代码。
您的起点是分隔符:（我重新创建了您的集合并将其放在.txt文件中）

import pandas as pd

df = pd.read_table("stack_example.txt", sep = ":", header = -1, names = 
["date","val"])

df['month'] = pd.DatetimeIndex(df['date']).month
df['year']  = pd.DatetimeIndex(df['date']).year
df.head()

最终

df_grp = df.loc[:,["val","month","year"]].groupby(["month", "year"]).mean()
df_grp

不计算.head()和import pandas，这是4行代码。

Answer 2

with open('/path/to/file','r') as f:
   fullfile = [x.strip() for x in f.readlines()]
datesprices=[(x.split(':')[0], x.split(':')[1]) for x in fullfile]

此代码将文件读取到一个名为fullfile的列表中，剥离换行符，然后使用split函数将与日期相对应的价格放入元组列表中。如果您有任何问题评论。

Answer 3

有人提到过使用正则表达式，因此我使用正则表达式设计了所有答案。有多种方法可以完成问题中的第一个任务，即将输入数据分为4个元素（月，日，年，价格）。我不确定您需要什么输出，因此您可以修改此代码以使用列表，字典等。

回答

import re

with open('tmpFile.txt', 'r') as input:
  lines = input.readlines()

  for line in lines:
    input_pattern = re.compile(r'(\d{2}-\d{2}-\d{4}):(\d{1}\.\d{2,3})')
    find_pattern = re.search(input_pattern, line)
    if find_pattern:
        ############################################
        # The regex above has 3 groups.
        # group(0) outputs this -- 04-05-1993:1.068
        # group(1) outputs this -- 04-05-1993
        # group(2) outputs this -- 1.068
        ############################################
        date_of_price = find_pattern.group(1)
        price_of_gas = find_pattern.group(2)

        print (date_of_price.split('-'))
        # outputs 
        ['04', '05', '1993']
        ['04', '05', '1993']
        ['04', '19', '1993']

        print (price_of_gas)
        # outputs
        1.068
        1.079
        1.079

回答两个

import re

input = open('tmpFile.txt', 'r')
  for line in input.readlines():
    print (re.split('[\-?:]+', line.rstrip('\n')))
    # outputs 
    ['04', '05', '1993', '1.068']
    ['04', '05', '1993', '1.079']
    ['04', '19', '1993', '1.079']

答案三

以下方法使用列表推导来存档与上述结果相同的结果。

import re

input = open('tmpFile.txt', 'r')
gas_price_info = [re.split('[\-?:]+', x.rstrip('\n')) for x in input.readlines()]
print (gas_price_info)
# outputs 
[['04', '05', '1993', '1.068'], ['04', '05', '1993', '1.079'], ['04', '19', '1993', '1.079']]

答案四

此答案与答案三相似，但是输入行已添加到列表理解代码中。这样会输出一个嵌套列表，如答案三。

gas_price_info = [re.split('[\-?:]+', x.rstrip('\n')) for x in open('tmpFile.txt').readlines()]

Answer 4

您可以使用csv stdlib模块，该模块适用于各种字符分隔的文件解析。

import csv

with open("path/to/file") as f:
    reader = csv.reader(f, delimiter=":")
    for date, gas_price in reader:
        # do whatever

如何在文本文件中分隔值？

4 个答案: