Python:文本文件在多行中替换不同的字符串怎么样?

时间:2017-02-16 05:22:27

标签: string python-3.x replace

基本任务:将URL请求转换为文本,并将其转储到文本文件(几乎是可用的CSV)。

目标:干净的CSV。在多行上,我试图替换多个(不同的)字符:

支架, tildes(〜), 每行末尾有额外的逗号。

我找不到任何相对简单的例子来完成这个。寻找可以逐行循环并替换的东西。

请注意:我希望这个文件随着时间的推移而变大,所以不能使用内存。

以下是创建文件的代码:

import urllib.request
with urllib.request.urlopen(URL1) as response:
    data = response.read()
decoded_data = data.decode(encoding='UTF-8')

str_data = str(decoded_data)
saveFile = open("test.txt",'w')
saveFile.write(str_data)
saveFile.close()

以下是文件的简化示例,第一行包含字段名称,第二行和第三行代表记录。

[[" F1"" F2"" F3"" F4"" F5&#34 ;, " F6&#34],

[" string11"" string12"" string13"" S〜ring14"" string15" " string16&#34],

[" string21"" string22"" S〜ring23"" string24"" string25" " string26"]]

3 个答案:

答案 0 :(得分:2)

如果要替换字符串的开头或结尾中的字符,请使用strip。如果您要删除的字符具有任意位置,请改为使用replace,如下所示:line.replace("~","")。请注意,与strip不同,您无法在一次replace调用中指定多个字符,但可以将它们链接起来,如下所示:line.replace("~","").replace(",","").replace("[","")

快速模拟可能对您有用的内容:

with open("text.txt", 'r') as f:
    with open("result.txt", 'w') as new_f:
        for line in f:
            new_line = line.strip(" [],\n\t\r").replace("~","")
            print(new_line)
            new_f.write(new_line+"\n")

因为我看到波浪可以在任何地方,括号和逗号通常出现在末尾。我还在strip中添加了“\ n”,“\ t”,“\ r”和空格,因为这些字符可能(至少“\ n”肯定会出现在每个字符的末尾)线。

答案 1 :(得分:0)

您可以使用简单的for循环来遍历文件。然后你可以替换每行中的字符

var data = `
id:
  id-jakut:
    en:
      name: North Jakarta City
      label: North Jakarta
    id:
      name: Kota Jakarta Utara
      label: Jakarta Utara
  id-jaksel:
    en:
      name: South Jakarta City
      label: South Jakarta
    id:
      name: Kota Jakarta Selatan
      label: Jakarta Selatan
tw:
  tw-tp:
    en:
      name: Taipei City
      label: Taipei
    zh-TW:
      name: 台北
      label: 台北市
  tw-ntp:
    en:
      name: New Taipei City
      label: New Taipei City
    zh-TW:
      name: 新北市
      label: 新北市
`

type cityLocale struct {
    Name  string `yaml:"name,flow"`
    Label string `yaml:"label,flow"`
}

type cityLocales map[string]cityLocale
type cities map[string]cityLocales
type countryCities map[string]cities

func main() {
    m := yaml.MapSlice{}
    err := yaml.Unmarshal([]byte(data), &m)
    if err != nil {
        log.Fatalf("error: %v", err)
    }
    fmt.Printf("MAPSLICE==>%+v\n\n", m)     

    t := countryCities{}
    err = yaml.Unmarshal([]byte(data), &t)
    if err != nil {
        log.Fatalf("error: %v", err)
    }
    fmt.Printf("NORMAL==>%+v\n", t["tw"])

}

答案 2 :(得分:0)

#!/usr/bin/env python3

# Note, I used the print function as a way to visually confirm the code worked.
# the URL_call will yield a byte that has serialized data for a basic table (columns and rows, where first row are column names -- just like Excel or SQL)

URL_call = ("http://www.zzz.com/blabla.html")

# URLIB module & function: the request has to be first decoded from UTF-8
import urllib.request
with urllib.request.urlopen(URL_call) as response:
    URL_data = response.read()

URL_data_decoded = URL_data.decode(encoding='UTF-8')

# use json to convert decoded response into a python structure (from a JSON structure)
import json
URL_data_JSON = json.loads(URL_data_decoded)

# pandas will transition the python data structure from a "list-like" array to a table.
import pandas as pd
URL_data_panda = pd.DataFrame(URL_data_JSON)

# this will create the text (in this case a CSV) file
URL_data_panda.to_csv("test.csv")

# The file will need the first row removed (columns are indexed coming out of the panda)

#determine line count
num_lines = sum(1 for line in open("test.csv"))

print(num_lines)

# the zero position is assigned to the first row of text. Writing from the second row (indexed as 1) get the removal done.
lines = open("test.csv").readlines()
open("test2.csv","w").writelines(lines[1:(num_lines)])


# Changes the name of the first column from zero to a normalized name.

import fileinput

# Note, below you could setup a back-up file, in the file input, by adding an extra argument in the parens ("test2.csv", inplace=True, backup='.bak')
with fileinput.FileInput("test2.csv", inplace=True) as file:
    for line in file:
        print(line.replace("0,", "REC_NUM,"), end='')