如何完成此Python脚本来操作制表符分隔文件中的数据?

时间:2018-11-23 12:52:59

标签: python-3.x list delimiter

我在制表符分隔的文件中有一个零件编号序列号的列表,我需要使用连字符将它们合并在一起以生成一个资产编号

这是输入

IE.Document.getElementById("buscaResponsavel").FireEvent "onkeypress"
'Set value however you are.....
IE.Document.getElementById("buscaResponsavel").FireEvent "onclick"

这就是我想要的所需输出

Part Number    Serial Number
PART1          SERIAL1
,PART2         SERIAL2
, PART3        SERIAL3

我尝试了以下代码

Part Number    Serial Number    Asset Number
PART1          SERIAL1          PART1-SERIAL1
,PART2         SERIAL2          PART2-SERIAL2
, PART3        SERIAL3          PART3-SERIAL3

此代码产生了实际输出

import csv
input_list = []
with open('Assets.txt', mode='r') as input:
    for row in input:
        field = row.strip().split('\t') #Remove new lines and split at tabs
        for x, i in enumerate(field):
            if i[0] == (','):   #If the start of a field starts with a comma
                field[x][0] = ('') #Replace that first character with nothing
                field[x].lstrip() #Strip any whitespace
        print(field)

我的第一个问题是,我从所有字段开头删除逗号和空格的代码无法正常工作。

第二个问题是在空格中添加了引号。

第三个问题是,我不知道如何向列表数组(资产编号)添加另一个项目,因此我可以加入这些字段。

请问有人可以帮助我解决任何这些问题吗?

3 个答案:

答案 0 :(得分:1)

import pandas as pd

data = {'Part Number': ['PART1',', PART2',',  PART3'],
        'Serial Number': ['Serial1','Serial2','Serial3']}

df = pd.DataFrame(data)

df.loc[:,'AssetNumber'] = df.loc[:,'Part Number'].apply(lambda x: str(x).strip().replace(',','')) + '-' + df.loc[:,'Serial Number'].apply(lambda x: str(x).strip().replace(',',''))

这会做你想要的

在处理CSV通话时情况

df = pd.read_csv('filepathasstring',sep='\t')

如果您有问题,请检查此行是否有问题:

Reading tab-delimited file with Pandas - works on Windows, but not on Mac

然后您可以通过调用以下内容将其另存为制表符:

df.to_csv('filepathasstring', sep='\t')

如果您还没有熊猫,这是如何获得它的方法:

https://pandas.pydata.org/pandas-docs/stable/install.html

答案 1 :(得分:1)

即使逗号不在此处,您也可以尝试剥离它们,因此不再需要if[0] == ",":。您还剥离了一个字符串,但该值未存储在列表中。这是固定的:

input_list = []
with open('Assets.txt', mode='r') as text_file:
    for row in text_file:
        field = row.strip('\n').split('\t') # Remove new lines and split at tabs.
        for n, word in enumerate(field):
            field[n] = word.lstrip(", ") # Strip any number of whitespaces and commas.
        print(field)

输出:

['Part Number', 'Serial Number']
['PART1', 'SERIAL1']
['PART2', 'SERIAL2']
['PART3', 'SERIAL3']

因此,现在我们可以将Asset_number = field[0] + '-' + field[1]放在某个位置,它将为您提供要使用的值PARTx-SERIALx

进行一些修改以获得所需的输出:

input_list = []
with open('Assets.txt', mode='r') as text_file:
    for m, row in enumerate(text_file):
        field = row.strip('\n').split('\t') # Remove new lines and split at tabs.
        for n, word in enumerate(field):
            field[n] = word.lstrip(", ") # Strip any number of whitespaces and commas.

        if m == 0: # Special case for the header.
            text_to_print = field[0] + '\t' + field[1]  + '\t' + 'Asset Number'
        else:
            Asset_number = field[0] + '-' + field[1]
            text_to_print = field[0] + '\t' + field[1]  + '\t' + Asset_number

        print(text_to_print)

打印输出为:

Part Number     Serial Number   Asset Number
PART1   SERIAL1 PART1-SERIAL1
PART2   SERIAL2 PART2-SERIAL2
PART3   SERIAL3 PART3-SERIAL3

由于某种原因,它在这里看起来不太好,但是字符串仍然正确,并且选项卡位于期望的位置,因此将其写入新文件而不是打印它应该没有问题。

'Part Number\tSerial Number\tAsset Number'
'PART1\tSERIAL1\tPART1-SERIAL1'
'PART2\tSERIAL2\tPART2-SERIAL2'
'PART3\tSERIAL3\tPART3-SERIAL3'

答案 2 :(得分:1)

您可以尝试下面的代码,它完全可以正常工作。

  input.txt
Part Number    Serial Number
PART1          SERIAL1
,PART2         SERIAL2
, PART3        SERIAL3
  split_text_add_combine.py
import re

def split_and_combine(in_path, out_path, new_column_name):
    format_string =  "{0:20s}{1:20s}{2:20s}"
    new_lines = [] # To store new lines

    # Reading input file to process
    with open(in_path) as f:
        lines = f.readlines()

        for index, line in enumerate(lines):
            line = line.strip()
            arr = re.split(r"\s{2,}", line)

            if index == 0:
                # Important to split words in case if words have more than single space
                new_line = format_string.format(arr[0], arr[1], new_column_name) + '\n'
            else:
                # arr = line.split()
                comma_removed_string = (arr[0] + "-" + arr[1]).lstrip(",").lstrip() 
                new_line = format_string.format(arr[0], arr[1], comma_removed_string) + '\n'

            new_lines.append(new_line)

    print(new_lines)

    # Writing new lines to: output.txt
    with open(out_path, "w") as f:
        f.writelines(new_lines)


if __name__ == "__main__":
    in_path = "input.txt"
    out_path = "output.txt"
    new_column_name = "Asset Number"

    split_and_combine(in_path, out_path, new_column_name)
  output.txt
Part Number         Serial Number       Asset Number        
PART1               SERIAL1             PART1-SERIAL1       
,PART2              SERIAL2             PART2-SERIAL2       
, PART3             SERIAL3             PART3-SERIAL3       
  参考文献: