大家好!
问题是我有一些txt文件,并且我有将它们放在一起的脚本。每个txt文件均始于:
Export Type: by LAI\GCI\SAI
LAI\GCI\SAI: fjdfkj
HLR NUMBER: NA
Routing Category: NA
Telephone Service: NA
Export User Scope: Attached & Detached User
Task Name: lfl;sfd
Data Type: col1/col2
Begin Time of Exporting data: 2019-4-14 19:41
=================================
col1 col2
40188e5464645645 54634565754
401884645645564 54545454564
401087465836453 54545454565
401885645656567 53434343435
401084569498484 54342340788
401088465836453 56767686334
401439569345656 64545467558
401012993933334 55645342352
401034545566463 34353463464
我想只从col1和col2开头合并(没有列的名称),但是脚本也将它们与单词开头合并。您可以更新此脚本吗?
import fileinput
import glob
file_list = glob.glob("*.txt")
with open('resultfile.txt', 'w') as file:
input_lines = fileinput.input(file_list)
file.writelines(input_lines)
另一个问题是我想在col2的值开头删除5,并且还删除所有从40108/40188 / 401088e开始的行。专栏真的很长。我总共有50-60个txt文件。谢谢!
最后应该看起来像这样:
40188e464645645 4634565754
401884645645564 4545454564
401087465836453 4545454565
401885645656567 3434343435
401084569498484 4342340788
401088465836453 6767686334
答案 0 :(得分:1)
首先按列表中的所有文件循环,然后按行并通过startswith
用元组过滤字符串:
with open('resultfile.txt', 'w') as file:
for f in file_list:
with open(f, 'r') as f1:
for line in f1:
if line.startswith(('40108','40188','401088')):
file.writelines(line)
答案 1 :(得分:0)
使用Pandas
的{{1}}跳过前几行:
skiprows
结果:
data = pd.read_csv('file.txt', skiprows=10)