2011-03-31

Multi-line HTML Tag Parsing with Regular Expressions and Java

I was looking at some regular expression related posts earlier today, and I noticed many of them showing something along the lines of:

<a\\b[^>]*href=\"[^>]*>(.*?)

The issue is in the
.*?

That may exclude line terminators. If you use
[\\W\\w\\s]*

in place of that. It will work around the line terminator issues. That is unless you are specifically looking for some specific tag without a line terminator.