首先,我必须将所有'/'charachter更改为 - > 'x / x'但没有发生任何事。之后我必须在循环找到“modul”时停止 - >这部分是工作。 我的代码有什么问题,如何修复它来更改这些字符? Python 2.7.13
12/12/2017
12/18/2017
12 January 2017
12/18/2017
12/12/2017
12 Jan 17
我的部分HTML代码:
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import MySQLdb
import re
filename=r'output.csv'
resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
f = open('0910000511.txt', 'r')
x = f.read()
soup = BeautifulSoup(x, 'lxml')
datatable=[]
stop = 0
for ctable in soup.find_all('table', "ctable" ):
for record in ctable.find_all('tr'):
temp_data = []
for data in record.find_all('td'):
temp_data.append(data.text.encode('latin-1'))
if '/' in data.text:
record2 = str(record).replace('/', ' / ')
final_format = ' {} '.format(record2)
if 'modul' in data.text:
stop = 1
break
datatable.append(temp_data)
if stop == 1:
break
if stop == 1:
break
output.writerows(datatable)
print record2
tab6col = soup.find('table', { "class" : "tab6col" })
datatable2=[]
for record in tab6col.find_all('tr'):
temp_data2 = []
for data in record.find_all('td'):
temp_data2.append(data.text.encode('latin-1'))
datatable.append(temp_data2)
output.writerows(datatable)
resultcsv.close()
所以我想更改所有/标签,例如:'3/1'到 - > '3/1'
答案 0 :(得分:0)
试试这个,
将record2 = str(record).replace('/', ' / ')
替换为record2 = str(record).replace('/', 'x/x')
答案 1 :(得分:0)
我看到了你的代码并发现了一个问题。您试图在脚本中找到td
,tr
标记,而html中有TR
和TD
标记。以下是我试过的代码。
a = """<tr><td> </td><TD class="contentsub" WIDTH="80">3/1</tr><td class="contentword_valid">NAME<BR>
Változás időpontja: 2013.12.30.<BR>
Bejegyzés kelte: 2013.12.19."""
from bs4 import BeautifulSoup
datatable=[]
stop = 0
soup = BeautifulSoup(a, 'html.parser')
for record in soup.find_all('tr'):
temp_data = []
for data in record.find_all('td'):
temp_data.append(data.text.encode('latin-1'))
record2 = str(record).replace('/', ' / ')
print(record2)
final_format = ' {} '.format(record2)
if 'modul' in data.text:
stop = 1
break
datatable.append(temp_data)
print(datatable)
if stop == 1:
break
输出:
<tr><td> < / td><td class="contentsub" width="80">3 / 1< / td>< / tr>
<tr><td> < / td><td class="contentsub" width="80">3 / 1< / td>< / tr>
[[b'\xa0', b'3/1']]