我想从文本文件中提取一些信息(在字符串之间,例如oldtime:... oldtime!>),并将其写入CSV文件中。我的输入文本文件是这样的:
=======================
oldtime:
hours:1:hours!>
minutes:12:minutes!>
oldtime!>
newtime:
hours:15:hours!>
minutes:17:minutes!>
newtime!>
oldtime:
hours:11:hours!>
minutes:22:minutes!>
oldtime!>
newtime:
hours:5:hours!>
minutes:17:minutes!>
newtime!>
==========================
我从这个开始,但是我不能再走了。
with open(inputfile, 'r') as f, open(outputfile.cvs, 'a') as f1:
f1.write("oldtime; newtime \n")
for row in f:
if "oldtime:" in str(row):
temp = re.split(r'(@oldtime[\n\r]|[\n\r]@oldtime!>)', str(row))
???
if "newtime:" in str(row):
temp = re.split(r'(@newtime[\n\r]|[\n\r]@newtime!>)', str(row))
我想将这样的csv文件作为输出
oldtime newtime
01:12 15:17
11:22 05:17
能帮我吗?谢谢。
答案 0 :(得分:2)
这是使用Regex和csv
模块的一种方法。
例如:
import re
import csv
with open(filename) as infile, open(filename_1, "w") as outfile:
data = infile.read()
hrs = re.findall(r"hours:(\d+):hours", data) #Get all HRS
mins = re.findall(r"minutes:(\d+):minutes", data) #Get All Mins
data = zip(hrs, mins)
writer = csv.writer(outfile) #Write CSV
writer.writerow(["oldtime", "newtime"]) #Header
for m, n in zip(data[0::2], data[1::2]):
writer.writerow([":".join(m), ":".join(n)]) #Write OLD time & New Time
答案 1 :(得分:1)
另一个类似于Rakesh解决方案的解决方案,假定您的文件始终具有相同的结构(旧时间->小时->分钟->新时间->小时->分钟...)。
提取具有正则表达式的字符串的所有数字:match = re.findall(r'\d+', str_file)
通过加入hours
和minutes
来转换此列表:dates = [i+ ":" + j for i, j in zip(match[::2], match[1::2])]
使用dataframe
模块创建pandas
代码在这里:
# Import module
import pandas as pd
with open("../temp.txt", 'r') as f:
# Read file as a string
str_file = f.read()
# Extract all numbers
match = re.findall(r'\d+', str_file)
print(match)
# ['1', '12', '15', '17', '11', '22', '5', '17']
# create dates
dates = [i+ ":" + j for i, j in zip(match[::2], match[1::2])]
print(dates)
# ['1:12', '15:17', '11:22', '5:17']
# create dataframe
df = pd.DataFrame({"oldtime": dates[::2],
"newtime": dates[1::2]})
print(df)
# oldtime newtime
# 0 1:12 15:17
# 1 11:22 5:17
# Export the data
df.to_csv("output.csv", index= False)
编辑1:
假设可以刷卡oldtime
和newtime
块。在这里,我逐行读取文件行,并在字典中将oldtime
和newtime
进行分类。有很多slice
,但正在处理我的测试文件。
# Import module
import pandas as pd
with open("../temp.txt", 'r') as f:
# Read file as a string
list_split = ["oldtime:", "newtime:"]
dates = {"oldtime:": [], "newtime:": []}
line = f.readline().rstrip('\n')
while True:
line = line.rstrip('\n')
print([line])
if line in list_split:
key = line
hours = f.readline().rstrip('\n').split(":")[1]
minutes = f.readline().rstrip('\n').split(":")[1]
dates[key].append(hours+':'+minutes)
line = f.readline()
if not line:
break
print(dates)
# {'oldtime:': ['1:12', '11:22'], 'newtime:': ['15:17', '5:17']}
# create dataframe
df = pd.DataFrame({"oldtime": dates["oldtime:"],
"newtime": dates["newtime:"]})
print(df)
# oldtime newtime
# 0 1:12 15:17
# 1 11:22 5:17
# Export the data
df.to_csv("output.csv", index=False)
编辑2:
import pandas as pd
with open("../temp.txt", 'r') as f:
# Read file as a string
list_split = ["oldtime:", "newtime:"]
dates = {"oldtime": [], "newtime": []}
line = f.readline().rstrip('\n')
while True:
# Ignore blank lines
if ("oldtime:" in line) or ("newtime:" in line):
# Process new "oldtime" or "newtime" block
# Class : either "oldtime" or "newtime"
class_time = line.replace(" ", "").rstrip('\n')[:-1]
# Default hour - minute values
hours = "24"
minutes = "60"
# Read next line
line = f.readline().rstrip('\n')
# While block not ended
while class_time + "!>" not in line:
# If hour in line: update hour
if 'hour' in line:
hours = line.split(":")[1]
# If minute in line: update minute
elif 'minute' in line:
minutes = line.split(":")[1]
# Read next line
line = f.readline().rstrip('\n')
# End block
# Add block read to dictionary
dates[class_time].append(hours+':'+minutes)
# Read next line
line = f.readline()
# If end of file: exit
if not line:
break
# create dataframe
df = pd.DataFrame({"oldtime": dates["oldtime"],
"newtime": dates["newtime"]})
# Export the data
df.to_csv("output.csv", index=False)
希望有帮助!
答案 2 :(得分:0)
大问题:)。
这是我做的一个简单解决方案,将字符串分隔为“:”字符,将数字字符串转换为整数,将其与:组合,然后将其写入csv。
这是代码:
import csv
f = "data.txt"
with open('data.txt','r') as f:
data = f.read()
data = data.split(sep=':')
nums = []
for i in data:
try:
nums.append(int(i))
except ValueError:
pass
times = []
for i in range(len(nums)):
if i%2 ==0:
times.append(str(nums[i]) + ":" + str(nums[i+1]))
num_rows = len(times)/2
with open('time_data.csv','w+',newline='') as f:
writer = csv.writer(f)
writer.writerow(['oldtime','newtime'])
for i in range(len(times)):
if i%2==0:
writer.writerow([times[i],times[i+1]])
在阅读Rakesh的答案后,我这样写:
import re
import csv
list_i = ''
file_name = 'data.txt'
file_name1 = 'data_1.txt'
with open(file_name,'r') as f, open(file_name1,'w',newline='') as f1:
data = f.read()
list_1 = re.findall(r'hours:\d+:hours',data)
list_2 = re.findall(r'minutes:\d+:minutes',data)
for i in list_1:
list_i += i
list_2_i = ''
for i in list_2:
list_2_i += i
list_1 = re.findall(r'\d+',list_i)
list_2 = re.findall(r'\d+',list_2_i)
data = []
for i in range(len(list_1)):
if i%2==0:
data.append([str(list_1[i]) + ':' + str(list_2[i]),str(list_1[i+1]) + ':' + str(list_2[i+1])])
writer = csv.writer(f1)
writer.writerow(['oldtime','newtime'])
for i in data:
writer.writerow(i)
@Rakesh您的代码还返回错误: TypeError:“ zip”对象不可下标 有没有办法来解决这个问题? :)