我有一个html代码作为字符串。我需要在该字符串中找到所有img标记,读取每个src属性的值并将其传递给函数,该函数返回一个整个img标记,需要取代读取的img标记。
它需要遍历整个字符串并为所有img标记执行相同的逻辑。
例如,假设我的html字符串如下所示:
string htmlBody= "<p>Hi everyone</p><img src=\"..." <p>I am here </p> <img src=\"..." />"
我有以下代码找到第一个img标记,获取src值(这是一个base64字符串)并将其转换为一个位数组来创建一个流,然后我可以创建一个新的src值链接到那条小溪。
//Remove from all src attributes "data:image/png;base64"
string res = Regex.Replace(htmlBody, "data:image\\/\\w+\\;base64\\,", "");
//Match the img tag and get the base64 string value
string matchString = Regex.Match(res, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;
var imageData = Convert.FromBase64String(matchString);
var contentId = Guid.NewGuid().ToString();
LinkedResource inline = new LinkedResource(new MemoryStream(imageData), "image/jpeg");
inline.ContentId = contentId;
inline.TransferEncoding = TransferEncoding.Base64;
//Replace all img tags with the new img tag
htmlBody = Regex.Replace(htmlBody, "<img.+?src=[\"'](.+?)[\"'].*?>", @"<img src='cid:" + inline.ContentId + @"'/>");
正如你所看到的那样,我有新的img标签要替换:
<img src='cid:" + inline.ContentId + @"'/>
但代码将使用相同的内容替换所有img标记。我需要能够获取img标签,执行逻辑,替换它,然后继续使用下一个img标签。
希望你能告诉我如何做到这一点。提前谢谢。
答案 0 :(得分:8)
如果我理解你的需要,你可以使用HtmlAgilityPack来达到这个目的。使用正则表达式可能会导致不必要的行为你能试试下面的代码吗?
public static string DoIt()
{
string htmlString = "";
using (WebClient client = new WebClient())
htmlString = client.DownloadString("http://dean.edwards.name/my/base64-ie.html"); //This is an example source for base64 img src, you can change this directly to your source.
HtmlDocument document = new HtmlDocument();
document.LoadHtml(htmlString);
document.DocumentNode.Descendants("img")
.Where(e =>
{
string src = e.GetAttributeValue("src", null) ?? "";
return !string.IsNullOrEmpty(src) && src.StartsWith("data:image");
})
.ToList()
.ForEach(x =>
{
string currentSrcValue = x.GetAttributeValue("src", null);
currentSrcValue = currentSrcValue.Split(',')[1];//Base64 part of string
byte[] imageData = Convert.FromBase64String(currentSrcValue);
string contentId = Guid.NewGuid().ToString();
LinkedResource inline = new LinkedResource(new MemoryStream(imageData), "image/jpeg");
inline.ContentId = contentId;
inline.TransferEncoding = TransferEncoding.Base64;
x.SetAttributeValue("src", "cid:" + inline.ContentId);
});
string result = document.DocumentNode.OuterHtml;
}
您可以从https://www.nuget.org/packages/HtmlAgilityPack
检索HtmlAgilityPack希望这有帮助
答案 1 :(得分:5)
我认为你需要为每个img fetched形式的字符串迭代你的代码。 以下代码为您提供了所有img标记的列表:
-(void)viewDidLoad
{
[super viewDidLoad];
self.textField = [[UITextField alloc]initWithFrame:CGRectMake(97, 150, 200, 29)];
self.textField.backgroundColor = [UIColor lightGrayColor];
self.textField.placeholder = @"Enter Numbers only";
self.textField.textAlignment = NSTextAlignmentCenter;
[self.view addSubview:self.textField];
self.addButton = [UIButton buttonWithType:UIButtonTypeContactAdd];
self.addButton.frame = CGRectMake(310, 160, 20, 20);
[self.addButton addTarget:self action:@selector(addingNum) forControlEvents:(UIControlEventTouchDown)];
[self.view addSubview:self.addButton];
self.resultSegment = [[UISegmentedControl alloc]init];
self.resultSegment.frame = CGRectMake(10, 200, 395, 29);
self.resultSegment.tintColor = [UIColor whiteColor];
[self.view addSubview:self.resultSegment];
self.refreshBtn = [UIButton buttonWithType:UIButtonTypeCustom];
self.refreshBtn.frame = CGRectMake(350, 160, 20, 20);
[self.refreshBtn setImage:[UIImage imageNamed:@"images.jpeg"] forState:UIControlStateNormal];
[self.refreshBtn addTarget:self action:@selector(removeAllSegments) forControlEvents:UIControlEventTouchDown];
[self.view addSubview:self.refreshBtn];
}
-(void)addingNum
{
[self.resultSegment insertSegmentWithTitle:[NSString stringWithFormat:@"%@",self.textField.text] atIndex:0 animated:YES];
}
-(void)removeAllSegments
{
[self.resultSegment removeAllSegments];
}
在循环中使用此列表和用户逻辑:
public static List<string> FetchImgsFromSource(string htmlSource)
{
List<string> listOfImgdata = new List<string>();
string regexImgSrc = @"<img[^>]*?src\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";
MatchCollection matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline);
foreach (Match m in matchesImgSrc)
{
string href = m.Groups[1].Value;
listOfImgdata.Add(href);
}
return listOfImgdata;
}
希望它适合你。
解析HTML dom的最佳方法是使用其他人提到的HtmlAgilityPack。