基本任务:将URL请求转换为文本,并将其转储到文本文件(几乎是可用的CSV)。
目标:干净的CSV。在多行上,我试图替换多个(不同的)字符:
支架, tildes(〜), 每行末尾有额外的逗号。
我找不到任何相对简单的例子来完成这个。寻找可以逐行循环并替换的东西。
请注意:我希望这个文件随着时间的推移而变大,所以不能使用内存。
以下是创建文件的代码:
import urllib.request
with urllib.request.urlopen(URL1) as response:
data = response.read()
decoded_data = data.decode(encoding='UTF-8')
str_data = str(decoded_data)
saveFile = open("test.txt",'w')
saveFile.write(str_data)
saveFile.close()
以下是文件的简化示例,第一行包含字段名称,第二行和第三行代表记录。
[[" F1"" F2"" F3"" F4"" F5&#34 ;, " F6&#34],
[" string11"" string12"" string13"" S〜ring14"" string15" " string16&#34],
[" string21"" string22"" S〜ring23"" string24"" string25" " string26"]]
答案 0 :(得分:2)
如果要替换字符串的开头或结尾中的字符,请使用strip。如果您要删除的字符具有任意位置,请改为使用replace,如下所示:line.replace("~","")
。请注意,与strip
不同,您无法在一次replace
调用中指定多个字符,但可以将它们链接起来,如下所示:line.replace("~","").replace(",","").replace("[","")
快速模拟可能对您有用的内容:
with open("text.txt", 'r') as f:
with open("result.txt", 'w') as new_f:
for line in f:
new_line = line.strip(" [],\n\t\r").replace("~","")
print(new_line)
new_f.write(new_line+"\n")
因为我看到波浪可以在任何地方,括号和逗号通常出现在末尾。我还在strip
中添加了“\ n”,“\ t”,“\ r”和空格,因为这些字符可能(至少“\ n”肯定会出现在每个字符的末尾)线。
答案 1 :(得分:0)
您可以使用简单的for循环来遍历文件。然后你可以替换每行中的字符
var data = `
id:
id-jakut:
en:
name: North Jakarta City
label: North Jakarta
id:
name: Kota Jakarta Utara
label: Jakarta Utara
id-jaksel:
en:
name: South Jakarta City
label: South Jakarta
id:
name: Kota Jakarta Selatan
label: Jakarta Selatan
tw:
tw-tp:
en:
name: Taipei City
label: Taipei
zh-TW:
name: 台北
label: 台北市
tw-ntp:
en:
name: New Taipei City
label: New Taipei City
zh-TW:
name: 新北市
label: 新北市
`
type cityLocale struct {
Name string `yaml:"name,flow"`
Label string `yaml:"label,flow"`
}
type cityLocales map[string]cityLocale
type cities map[string]cityLocales
type countryCities map[string]cities
func main() {
m := yaml.MapSlice{}
err := yaml.Unmarshal([]byte(data), &m)
if err != nil {
log.Fatalf("error: %v", err)
}
fmt.Printf("MAPSLICE==>%+v\n\n", m)
t := countryCities{}
err = yaml.Unmarshal([]byte(data), &t)
if err != nil {
log.Fatalf("error: %v", err)
}
fmt.Printf("NORMAL==>%+v\n", t["tw"])
}
答案 2 :(得分:0)
#!/usr/bin/env python3
# Note, I used the print function as a way to visually confirm the code worked.
# the URL_call will yield a byte that has serialized data for a basic table (columns and rows, where first row are column names -- just like Excel or SQL)
URL_call = ("http://www.zzz.com/blabla.html")
# URLIB module & function: the request has to be first decoded from UTF-8
import urllib.request
with urllib.request.urlopen(URL_call) as response:
URL_data = response.read()
URL_data_decoded = URL_data.decode(encoding='UTF-8')
# use json to convert decoded response into a python structure (from a JSON structure)
import json
URL_data_JSON = json.loads(URL_data_decoded)
# pandas will transition the python data structure from a "list-like" array to a table.
import pandas as pd
URL_data_panda = pd.DataFrame(URL_data_JSON)
# this will create the text (in this case a CSV) file
URL_data_panda.to_csv("test.csv")
# The file will need the first row removed (columns are indexed coming out of the panda)
#determine line count
num_lines = sum(1 for line in open("test.csv"))
print(num_lines)
# the zero position is assigned to the first row of text. Writing from the second row (indexed as 1) get the removal done.
lines = open("test.csv").readlines()
open("test2.csv","w").writelines(lines[1:(num_lines)])
# Changes the name of the first column from zero to a normalized name.
import fileinput
# Note, below you could setup a back-up file, in the file input, by adding an extra argument in the parens ("test2.csv", inplace=True, backup='.bak')
with fileinput.FileInput("test2.csv", inplace=True) as file:
for line in file:
print(line.replace("0,", "REC_NUM,"), end='')