我想合并一些txt文件

时间:2019-04-23 17:34:18

标签: excel python-3.x pandas data-science

大家好!

问题是我有一些txt文件,并且我有将它们放在一起的脚本。每个txt文件均始于:

Export Type:                        by LAI\GCI\SAI
LAI\GCI\SAI:                        fjdfkj
HLR NUMBER:                         NA
Routing Category:                   NA
Telephone Service:                  NA
Export User Scope:                  Attached & Detached User
Task Name:                          lfl;sfd
Data Type:                          col1/col2
Begin Time of Exporting data:       2019-4-14 19:41
=================================
col1                    col2         
401885464645645         54634565754     
401884645645564         54545454564
401087465836453         54545454565     
401885645656567         53434343435
401084569498484         54342340788
401088465836453         56767686334
401439569345656         64545467558
401012993933334         55645342352
401034545566463         34353463464

我想只从col1和col2开头合并(没有列的名称),但是脚本也将它们与单词开头合并。 您可以更新此脚本吗?

import fileinput
import glob

file_list = glob.glob("*.txt")

with open('resultfile.txt', 'w') as file:
    input_lines = fileinput.input(file_list)
    file.writelines(input_lines)

另一个问题是我想在col2的值开头删除5,并且还删除所有从40108/40188 / 401088e开始的行。谢谢!

1 个答案:

答案 0 :(得分:0)

通过指定标题行有选择地导入标题。这样可以访问数据帧中的“标题”数据。从那里,它们可以连接起来并作为csv写回。

假设问题上有标签,假设您希望通过熊猫来做到这一点。

import pandas as pd
from pandas.compat import StringIO
import fileinput
import glob


csvdata = str("""Export Type:                        by LAI\GCI\SAI
LAI\GCI\SAI:                        fjdfkj
HLR NUMBER:                         NA
Routing Category:                   NA
Telephone Service:                  NA
Export User Scope:                  Attached & Detached User
Task Name:                          lfl;sfd
Data Type:                          col1/col2
Begin Time of Exporting data:       2019-4-14 19:41
=================================
col1                    col2
401885464645645         54634565754
401884645645564         54545454564
401087465836453         54545454565
401885645656567         53434343435
401084569498484         54342340788
401088465836453         56767686334
401439569345656         64545467558
401012993933334         55645342352
401034545566463         34353463464""")

files = ["file{}.txt".format(i) for i in range(3)]
for fn in files:
    with open(fn, "w") as f:
        f.write(csvdata)

file_list = glob.glob("file*.txt")

dfs = []
for f in file_list:
    df = pd.read_csv(f, sep="\s+", header=[10])
    dfs.append(df)

df = pd.concat(dfs)
df.reset_index(inplace=True)

df.to_csv("resultfile.txt")

产生

,index,col1,col2
0,0,401885464645645,54634565754
1,1,401884645645564,54545454564
2,2,401087465836453,54545454565
3,3,401885645656567,53434343435
4,4,401084569498484,54342340788
5,5,401088465836453,56767686334
6,6,401439569345656,64545467558
7,7,401012993933334,55645342352
8,8,401034545566463,34353463464
9,0,401885464645645,54634565754
10,1,401884645645564,54545454564
11,2,401087465836453,54545454565
12,3,401885645656567,53434343435
...snip...