This is an old revision of the document!


Useful regular expressions

Clean paragraph tags

Replace

<p[^>]+>

with

<p>

remove span tags:

Replace

<[/]?span[^>]*>

with nothing

Remove empty paragraph tags:

Replace

<p[^>]*>&nbsp;</p>

with nothing

Replace

<p[^>]*>[ ]?</p>

with nothing

Bold paragraphs to h4:

Replace

<p[^>]*><strong>([^<]+)</strong></p>
<p[^>]*><b>([^<]+)</b></p>

With:

<h4>\1</h4>

Handling footnotes:

<a (href="#_ftn[0-9]") (name="_ftnref[0-9]") title=""></a>(\[[0-9]\])
<a (href="#_ftnref[0-9]") (name="_ftn[0-9]") title=""></a>(\[[0-9]\])
<a \2 /><a \1>\3</a>
<code>
<sup><a name="_ftnref\1" /><a href="#_ftn\1">\1</a></sup>
<a name="_ftn\1" /><a href="#_ftnref\1">\1</a>
<sup><a class="sdendnoteanc" (name="sdendnote[0-9]anc") (href="#sdendnote[0-9]sym")></a><sup>([a-z]*)</sup></sup>
<sup><a \1 /><a \2>\3</a></sup>
([\.”!])[ ]*([0-9]{1,2})([ <])
\1<sup><a name="_ftnref\2" /><a href="#_ftn\2">\2</a></sup>\3

Finding and replacing double quotation:

(?<!\=)"((?!"|'')[^"\n>]*)("|'')(?!>)(\W)
“\1”\3
<p>"([^"\n]+)</p>
<p>“\1</p>
(?<!\=)'((?!')[^'\n>]*)(')(?!>)(\W)
‘\1’\3

Removing line breaks in code:

This is useful if above isn't working because of line breaks. Replace

\n</p> 

with

</p>

Replace  

\r\n</p>

with

</p>