如何使用循环替换python中的字符?

时间:2017-07-11 06:35:22

标签: python loops replace web-scraping beautifulsoup

首先,我必须将所有'/'charachter更改为 - > 'x / x'但没有发生任何事。之后我必须在循环找到“modul”时停止 - >这部分是工作。 我的代码有什么问题,如何修复它来更改这些字符? Python 2.7.13

12/12/2017
12/18/2017
12 January 2017
12/18/2017
12/12/2017
12 Jan 17

我的部分HTML代码:

import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import MySQLdb
import re

filename=r'output.csv'

resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
 
f = open('0910000511.txt', 'r')
x = f.read()

soup = BeautifulSoup(x, 'lxml')

datatable=[]
stop = 0
for ctable in soup.find_all('table',  "ctable" ):
    for record in ctable.find_all('tr'):
        temp_data = []
        for data in record.find_all('td'):
            temp_data.append(data.text.encode('latin-1'))
            if '/' in data.text:
                record2 = str(record).replace('/', ' / ')
                final_format = ' {} '.format(record2)
            if 'modul' in data.text:
                stop = 1
                break
        datatable.append(temp_data)
        if stop == 1:
            break
    if stop == 1:
        break
output.writerows(datatable)

print record2
tab6col = soup.find('table', { "class" : "tab6col" })
datatable2=[]
for record in tab6col.find_all('tr'):
    temp_data2 = []
    for data in record.find_all('td'):
        temp_data2.append(data.text.encode('latin-1'))
    datatable.append(temp_data2)

output.writerows(datatable)

resultcsv.close()

所以我想更改所有/标签,例如:'3/1'到 - > '3/1'

2 个答案:

答案 0 :(得分:0)

试试这个,

record2 = str(record).replace('/', ' / ')替换为record2 = str(record).replace('/', 'x/x')

答案 1 :(得分:0)

我看到了你的代码并发现了一个问题。您试图在脚本中找到tdtr标记,而html中有TRTD标记。以下是我试过的代码。

a = """<tr><td>&nbsp;</td><TD class="contentsub" WIDTH="80">3/1</tr><td class="contentword_valid">NAME<BR>
Változás időpontja: 2013.12.30.<BR>
Bejegyzés kelte: 2013.12.19."""
from bs4 import BeautifulSoup
datatable=[]
stop = 0
soup = BeautifulSoup(a, 'html.parser')
for record in soup.find_all('tr'):
    temp_data = []
    for data in record.find_all('td'):
        temp_data.append(data.text.encode('latin-1'))
        record2 = str(record).replace('/', ' / ')
        print(record2)
        final_format = ' {} '.format(record2)
        if 'modul' in data.text:
            stop = 1
            break
    datatable.append(temp_data)
    print(datatable)
    if stop == 1:
        break

输出:

<tr><td> < / td><td class="contentsub" width="80">3 / 1< / td>< / tr>
<tr><td> < / td><td class="contentsub" width="80">3 / 1< / td>< / tr>
[[b'\xa0', b'3/1']]