Question

基本任务：将URL请求转换为文本，并将其转储到文本文件（几乎是可用的CSV）。

目标：干净的CSV。在多行上，我试图替换多个（不同的）字符：

支架， tildes（〜），每行末尾有额外的逗号。

我找不到任何相对简单的例子来完成这个。寻找可以逐行循环并替换的东西。

请注意：我希望这个文件随着时间的推移而变大，所以不能使用内存。

以下是创建文件的代码：

import urllib.request
with urllib.request.urlopen(URL1) as response:
    data = response.read()
decoded_data = data.decode(encoding='UTF-8')

str_data = str(decoded_data)
saveFile = open("test.txt",'w')
saveFile.write(str_data)
saveFile.close()

以下是文件的简化示例，第一行包含字段名称，第二行和第三行代表记录。

[[＆＃34; F1＆＃34;＆＃34; F2＆＃34;＆＃34; F3＆＃34;＆＃34; F4＆＃34;＆＃34; F5＆＃34 ;, ＆＃34; F6＆＃34]，

[＆＃34; string11＆＃34;＆＃34; string12＆＃34;＆＃34; string13＆＃34;＆＃34; S〜ring14＆＃34;＆＃34; string15＆＃34; ＆＃34; string16＆＃34]，

[＆＃34; string21＆＃34;＆＃34; string22＆＃34;＆＃34; S〜ring23＆＃34;＆＃34; string24＆＃34;＆＃34; string25＆＃34; ＆＃34; string26＆＃34;]]

Answer 1

如果要替换字符串的开头或结尾中的字符，请使用strip。如果您要删除的字符具有任意位置，请改为使用replace，如下所示：line.replace("~","")。请注意，与strip不同，您无法在一次replace调用中指定多个字符，但可以将它们链接起来，如下所示：line.replace("~","").replace(",","").replace("[","")

快速模拟可能对您有用的内容：

with open("text.txt", 'r') as f: with open("result.txt", 'w') as new_f: for line in f: new_line = line.strip(" [],\n\t\r").replace("~","") print(new_line) new_f.write(new_line+"\n")

因为我看到波浪可以在任何地方，括号和逗号通常出现在末尾。我还在strip中添加了“\ n”，“\ t”，“\ r”和空格，因为这些字符可能（至少“\ n”肯定会出现在每个字符的末尾）线。

Answer 2

您可以使用简单的for循环来遍历文件。然后你可以替换每行中的字符

var data = `
id:
  id-jakut:
    en:
      name: North Jakarta City
      label: North Jakarta
    id:
      name: Kota Jakarta Utara
      label: Jakarta Utara
  id-jaksel:
    en:
      name: South Jakarta City
      label: South Jakarta
    id:
      name: Kota Jakarta Selatan
      label: Jakarta Selatan
tw:
  tw-tp:
    en:
      name: Taipei City
      label: Taipei
    zh-TW:
      name: 台北
      label: 台北市
  tw-ntp:
    en:
      name: New Taipei City
      label: New Taipei City
    zh-TW:
      name: 新北市
      label: 新北市
`

type cityLocale struct {
    Name  string `yaml:"name,flow"`
    Label string `yaml:"label,flow"`
}

type cityLocales map[string]cityLocale
type cities map[string]cityLocales
type countryCities map[string]cities

func main() {
    m := yaml.MapSlice{}
    err := yaml.Unmarshal([]byte(data), &m)
    if err != nil {
        log.Fatalf("error: %v", err)
    }
    fmt.Printf("MAPSLICE==>%+v\n\n", m)     

    t := countryCities{}
    err = yaml.Unmarshal([]byte(data), &t)
    if err != nil {
        log.Fatalf("error: %v", err)
    }
    fmt.Printf("NORMAL==>%+v\n", t["tw"])

}

Answer 3

#!/usr/bin/env python3

# Note, I used the print function as a way to visually confirm the code worked.
# the URL_call will yield a byte that has serialized data for a basic table (columns and rows, where first row are column names -- just like Excel or SQL)

URL_call = ("http://www.zzz.com/blabla.html")

# URLIB module & function: the request has to be first decoded from UTF-8
import urllib.request
with urllib.request.urlopen(URL_call) as response:
    URL_data = response.read()

URL_data_decoded = URL_data.decode(encoding='UTF-8')

# use json to convert decoded response into a python structure (from a JSON structure)
import json
URL_data_JSON = json.loads(URL_data_decoded)

# pandas will transition the python data structure from a "list-like" array to a table.
import pandas as pd
URL_data_panda = pd.DataFrame(URL_data_JSON)

# this will create the text (in this case a CSV) file
URL_data_panda.to_csv("test.csv")

# The file will need the first row removed (columns are indexed coming out of the panda)

#determine line count
num_lines = sum(1 for line in open("test.csv"))

print(num_lines)

# the zero position is assigned to the first row of text. Writing from the second row (indexed as 1) get the removal done.
lines = open("test.csv").readlines()
open("test2.csv","w").writelines(lines[1:(num_lines)])


# Changes the name of the first column from zero to a normalized name.

import fileinput

# Note, below you could setup a back-up file, in the file input, by adding an extra argument in the parens ("test2.csv", inplace=True, backup='.bak')
with fileinput.FileInput("test2.csv", inplace=True) as file:
    for line in file:
        print(line.replace("0,", "REC_NUM,"), end='')

Python：文本文件在多行中替换不同的字符串怎么样？

3 个答案: