我有一个很小的爬虫(python,bs4)。但是如果我想要抓取的文本连续有两个以上的wordwraps(新行),则内容将被写入多个单元格中。
例如:
AAA
BBB
CCC
csv细胞的结果是“AAA BBB CCC”
不好的情况:
AAA
BBB
CCC
结果如下:
单元格1:AAA BBB
单元格2(第二行):CCC
守则是:
...
beschreibung_container = container.find_all("pre", {"class":"is24qa-objektbeschreibung text-content short-text"}) or ""
beschreibung = beschreibung_container[0].get_text().strip() if beschreibung_container else ""
ausstattung_container = container.find_all("pre", {"class":"is24qa-ausstattung text-content short-text"}) or ""
ausstattung = ausstattung_container[0].get_text().strip() if ausstattung_container else ""
lage_container = container.find_all("pre", {"class":"is24qa-lage text-content short-text"}) or ""
lage = lage_container[0].get_text().strip() if lage_container else ""
except:
print("Es gab einen Fehler")
f.write(objektid + "##" + titel + "##" + adresse + "##" + criteria.replace(" ", ";") + "##" + preis.replace(" ", ";") + "##" + energie.replace(" ", ";") + "##" + beschreibung.replace("\n", " ") + "##" + ausstattung.replace("\n", " ") + "##" + lage.replace("\n", " ") + "\n")
...
是否有可能更换所有wordwraps?
答案 0 :(得分:1)
您可以使用re.sub将与一个或多个换行符匹配的任何内容(// create a square box
const l = 100, w = 100, h = 100, roundRadius = 5, bevelRadius = 10;
var shape = new THREE.Shape();
shape.moveTo( -l/2 + roundRadius, -w/2 );
shape.lineTo( l/2 - roundRadius, -w/2 );
shape.absarc ( l/2 - roundRadius, -w/2 + roundRadius, roundRadius, -Math.PI/2, 0, false );
shape.lineTo( l/2, w/2 - roundRadius );
shape.absarc ( l/2 - roundRadius, w/2 - roundRadius, roundRadius, 0, Math.PI/2, false );
shape.lineTo( -l/2 + roundRadius, w/2 );
shape.absarc ( -l/2 + roundRadius, w/2 - roundRadius, roundRadius, Math.PI/2, Math.PI, false );
shape.lineTo( -l/2, -w/2 + roundRadius );
shape.absarc ( -l/2 + roundRadius, -w/2 + roundRadius, roundRadius, Math.PI, -Math.PI/2, false );
// extrude it
var extrudeSettings = { amount: h, bevelEnabled: true, bevelThickness: bevelRadius, bevelSize: bevelRadius, bevelSegments: 20 };
const geo = new THREE.ExtrudeGeometry( shape, extrudeSettings );
)替换为所需字符串中的空格:
\n
如果您需要替换回车符(re.sub(r'\n+', ' ', str)
)以及换行符,您可以使用:
\r
以下是您的代码将如何更改:
re.sub(r'[\r\n]+', ' ', str)