在网站http://web.mta.info/status/serviceStatus.txt中,某些代码会被编码,例如<br>
。我想知道如何将这些标签解码回正常格式,以便我可以解析并阅读它们。接下来的代码就是我目前的代码。
String address = "http://web.mta.info/status/serviceStatus.txt";
XmlPullParserFactory pullParserFactory;
XmlPullParser parser;
HttpClient httpclient;
HttpGet httpget;
URI website;
HttpResponse response;
HttpEntity httpEntity;
InputStream xmlFile;
//code that just initializes some other variables
private void updater() {
// try catch to catch any exceptions thrown
try {
httpclient = new DefaultHttpClient();
httpget = new HttpGet(address);
response = httpclient.execute(httpget);
httpEntity = response.getEntity();
xmlFile = httpEntity.getContent();
pullParserFactory = XmlPullParserFactory.newInstance();
parser = pullParserFactory.newPullParser();
parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false);
parser.setInput(xmlFile, null);
parseXML(parser);
} catch (ClientProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (XmlPullParserException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
parseXML基本上是通过该文件并找到我需要的信息。
答案 0 :(得分:0)
替换
import cv2
import numpy as np
def getsamples(img):
x, y, z = img.shape
samples = np.empty([x * y, z])
index = 0
for i in range(x):
for j in range(y):
samples[index] = img[i, j]
index += 1
return samples
def EMSegmentation(img, no_of_clusters=2):
output = img.copy()
colors = np.array([[0, 11, 111], [22, 22, 22]])
samples = getsamples(img)
em = cv2.ml.EM_create()
em.setClustersNumber(no_of_clusters)
em.trainEM(samples)
means = em.getMeans()
covs = em.getCovs() # Known bug: https://github.com/opencv/opencv/pull/4232
x, y, z = img.shape
distance = [0] * no_of_clusters
for i in range(x):
for j in range(y):
for k in range(no_of_clusters):
diff = img[i, j] - means[k]
distance[k] = abs(np.dot(np.dot(diff, covs[k]), diff.T))
output[i][j] = colors[distance.index(max(distance))]
return output
img = cv2.imread('dinosaur.jpg')
output = EMSegmentation(img)
cv2.imshow('image', img)
cv2.imshow('EM', output)
cv2.waitKey(0)
cv2.destroyAllWindows()
&lt;
使用&gt;
<
>
代表&lt;
<
代表&gt;
这些html实体留在您提到的网站的代码段中,大多数是由于一个错误,这就是该网站逃脱代码段的方式。