Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
regex [2021/03/19 08:47] niklasregex [2024/02/14 12:20] (current) – external edit 127.0.0.1
Line 1: Line 1:
-**Clean paragraph tags**+====== Useful regular expressions ====== 
 + 
 +A relatively advanced way to potentially semi-automate a few processes is the use of regular expressions. These are basically an advanced form of “find and replace” which can be very powerful. When used properly, they can clean up certain code and properly format things like footnotes in record time, or convert square bracket<sup>''[1]''</sup> style footnotes to without<sup>''1''</sup> for print, and vice versa ((Concretely, you would look for something like ''class="noteref">(.*?)</a></sup>'' and replace this with something like ''>[\1]</a></sup>'')). For example, from the final Wellred InDesign book to [[publishing:digital:ebook|EPUB export]], we can now often achieve a 100% finished ebook in less than an hour thanks to a series of regular expressions executed at once. 
 + 
 +The easiest way to understand the concept is to use a simple one. A fairly typical export from InDesign will have Headers as a paragraph tag, which isn't much use for HTML/EPUB: 
 + 
 +Rather than 
 + 
 +''<p class="Headings_Introduction-Title">Chapter title</p>'' 
 + 
 +we would like 
 + 
 +''<h1>Chapter title</h1>'' 
 + 
 +Using regex, for the initial sequence (the "find" part of the find and replace) this becomes : 
 + 
 +''<p class="Headings_Introduction-Title">(.*?)</p>'' 
 + 
 +''(.*?)'' is a variable that captures everything in between the paragraph tag with class ''"Headings_Introduction-Title"''
 + 
 +To convert this to a proper <h1> tag we would replace this with: 
 + 
 +''<h1>\1</h1>'' 
 + 
 +(with ''\1'' being the "capture group", the "replace" bit) 
 + 
 +https://regex101.com/ is a useful resource to try out certain patterns. 
 + 
 +MV's plain text file {{ ::regex-mv.pdf |}} used for producing ebooks may be useful to get a better idea. 
 + 
 +Below are some further examples. 
 + 
 + 
 +=====Clean paragraph tags=====
 Replace Replace
 <code><p[^>]+></code> <code><p[^>]+></code>
 with  with 
 <code><p></code> <code><p></code>
-**remove span tags:**+=====remove span tags:=====
 Replace Replace
 <code><[/]?span[^>]*></code> <code><[/]?span[^>]*></code>
 with nothing with nothing
  
-**Remove empty paragraph tags:**+=====Remove empty paragraph tags:=====
 Replace Replace
 <code><p[^>]*>&nbsp;</p></code> <code><p[^>]*>&nbsp;</p></code>
Line 18: Line 51:
 with nothing with nothing
  
-**Bold paragraphs to h4:**+=====Bold paragraphs to h4:=====
 Replace Replace
 <code> <code>
Line 27: Line 60:
 <h4>\1</h4> <h4>\1</h4>
 </code> </code>
-**Handling footnotes:**+=====Handling footnotes:=====
 <code> <code>
 <a (href="#_ftn[0-9]") (name="_ftnref[0-9]") title=""></a>(\[[0-9]\]) <a (href="#_ftn[0-9]") (name="_ftnref[0-9]") title=""></a>(\[[0-9]\])
Line 48: Line 81:
 \1<sup><a name="_ftnref\2" /><a href="#_ftn\2">\2</a></sup>\3 \1<sup><a name="_ftnref\2" /><a href="#_ftn\2">\2</a></sup>\3
 </code> </code>
-**Finding and replacing double quotation:**+=====Finding and replacing double quotation:=====
 <code> <code>
 (?<!\=)"((?!"|'')[^"\n>]*)("|'')(?!>)(\W) (?<!\=)"((?!"|'')[^"\n>]*)("|'')(?!>)(\W)
Line 59: Line 92:
 <code>‘\1’\3</code> <code>‘\1’\3</code>
  
-**Removing line breaks in code:**+=====Removing line breaks in code:=====
 This is useful if above isn't working because of line breaks. This is useful if above isn't working because of line breaks.
 Replace  Replace