This is an old revision of the document!
Useful regular expressions
Clean paragraph tags
Replace
<p[^>]+>
with
<p>
remove span tags:
Replace
<[/]?span[^>]*>
with nothing
Remove empty paragraph tags:
Replace
<p[^>]*> </p>
with nothing
Replace
<p[^>]*>[ ]?</p>
with nothing
Bold paragraphs to h4:
Replace
<p[^>]*><strong>([^<]+)</strong></p> <p[^>]*><b>([^<]+)</b></p>
With:
<h4>\1</h4>
Handling footnotes:
<a (href="#_ftn[0-9]") (name="_ftnref[0-9]") title=""></a>(\[[0-9]\]) <a (href="#_ftnref[0-9]") (name="_ftn[0-9]") title=""></a>(\[[0-9]\])
<a \2 /><a \1>\3</a> <code> <sup><a name="_ftnref\1" /><a href="#_ftn\1">\1</a></sup> <a name="_ftn\1" /><a href="#_ftnref\1">\1</a>
<sup><a class="sdendnoteanc" (name="sdendnote[0-9]anc") (href="#sdendnote[0-9]sym")></a><sup>([a-z]*)</sup></sup>
<sup><a \1 /><a \2>\3</a></sup>
([\.”!])[ ]*([0-9]{1,2})([ <]) \1<sup><a name="_ftnref\2" /><a href="#_ftn\2">\2</a></sup>\3
Finding and replacing double quotation:
(?<!\=)"((?!"|'')[^"\n>]*)("|'')(?!>)(\W) “\1”\3 <p>"([^"\n]+)</p> <p>“\1</p> (?<!\=)'((?!')[^'\n>]*)(')(?!>)(\W)
‘\1’\3
Removing line breaks in code:
This is useful if above isn't working because of line breaks. Replace
\n</p>
with
</p>
Replace
\r\n</p>
with
</p>