{"id":3936,"date":"2022-03-28T16:11:25","date_gmt":"2022-03-29T00:11:25","guid":{"rendered":"https:\/\/wonghoi.humgar.com\/blog\/?p=3936"},"modified":"2022-03-28T17:29:26","modified_gmt":"2022-03-29T01:29:26","slug":"regex-notes","status":"publish","type":"post","link":"https:\/\/wonghoi.humgar.com\/blog\/2022\/03\/28\/regex-notes\/","title":{"rendered":"Regex Notes"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/www.rexegg.com\/regex-quickstart.html\" target=\"_blank\" rel=\"noreferrer noopener\">Concepts<\/a><\/h2>\n\n\n\n<p>Mechanics<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>.<\/code> any character<\/li><li><code>\\<\/code> escapes special characters<\/li><li><strong>characters<\/strong> (<code>\\d<\/code> digits,<code>\\w<\/code> word (i.e. letter\/digit\/underscore), <code>\\s<\/code> whitespace).<\/li><li><code>[]<\/code> <strong>character classes<\/strong> (define rules over what characters are accepted, unlike the <code>.<\/code> wildcard)<br><code>[3-7]<\/code> hypen inside <code>[]<\/code> bracket can specify ranges to mean things such as `[3,4,5,6,7]`<br><code>[^ ...]<\/code> is the mirror of it to exclude the mentioned characters<\/li><li><code>|<\/code> <strong>choices<\/strong> (think of it as OR)<\/li><li>Complement (i.e. everything but) version are capitalized, such as <code>\\D<\/code> is everything not a <code>\\d<\/code><\/li><li>whitespaces (<code>\\n<\/code> newline, <code>\\t<\/code> tab, <\/li><\/ul>\n\n\n\n<p>Modifiers<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>repetition <strong style=\"font-size: 1.0625rem;\">quantifiers<\/strong><span style=\"font-size: 1.0625rem;\"> (<\/span><code style=\"font-size: 1.0625rem; background-color: rgb(255, 255, 255);\">?<\/code><span style=\"font-size: 1.0625rem;\"> 0~1 times, <\/span><code style=\"font-size: 1.0625rem; background-color: rgb(255, 255, 255);\">+<\/code><span style=\"font-size: 1.0625rem;\"> at least once, <\/span><code style=\"font-size: 1.0625rem; background-color: rgb(255, 255, 255);\">*<\/code><span style=\"font-size: 1.0625rem;\"> any times, <\/span><code style=\"font-size: 1.0625rem; background-color: rgb(255, 255, 255);\">{match how many times}<\/code><span style=\"font-size: 1.0625rem;\">)<\/span><\/li><li><code>(? ...)<\/code> <strong>inline modifiers<\/strong> alters behaviors such as how newlines, case sensitivity, whether <code>(...)<\/code> captures or just groups, and comments within patterns are handled<\/li><\/ul>\n\n\n\n<p>Positioning rules<\/p>\n\n\n\n<ul class=\"wp-block-list\" id=\"block-06421975-d5fc-4234-8573-93b52ddafa1c\"><li><strong>anchors<\/strong> (<code>^<\/code> begins with, <code>$<\/code> ends with)<\/li><li><code>\\b<\/code> word <strong>boundary<\/strong><\/li><\/ul>\n\n\n\n<p>Output behavior<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>(...)<\/code> <strong>capturing<\/strong> group, <code>(?: ...)<\/code> <strong>non-capturing<\/strong> group<\/li><li><code>\\(index)<\/code> <strong>content of previous matched<\/strong> groups\/chunks referred to by indices. <br>This feature generates derived new content instead of just extracting<\/li><li><code style=\"font-size: 1.0625rem; background-color: rgb(255, 255, 255);\">(?( = | &lt;= | ! | &lt;! ) ...assertions...)<\/code><span style=\"font-size: 1.0625rem;\"> <\/span><strong style=\"font-size: 1.0625rem;\">lookarounds<\/strong><span style=\"font-size: 1.0625rem;\"> <\/span><span style=\"font-size: 1.0625rem; text-decoration-line: underline;\">skips<\/span><span style=\"font-size: 1.0625rem;\"> the contents mentioned in <\/span><code style=\"font-size: 1.0625rem; background-color: rgb(255, 255, 255);\">...assertion...<\/code><span style=\"font-size: 1.0625rem;\"> before\/after the pattern so you can toss out the matched assertion from your capture results. <\/span><\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><code>(?s)<\/code> Also match newline characters (&#8216;<span style=\"text-decoration: underline;\"><strong><em>s<\/em><\/strong><\/span>ingle-line&#8217; or DOTALL mode)<\/h2>\n\n\n\n<p>Starting with <code>(?s)<\/code> <em>flag<\/em> (also called <em>inline modifiers<\/em>) expands the <code>.<\/code> (dot) single <em>character <\/em>pattern to ALSO match multiple lines (not by default). <\/p>\n\n\n\n<p>Useful for extracting the contents of HTML blocks blindly and post-process it elsewhere<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>(?m)<\/code> Pattern starts over as a new string for each line (&#8216;<em><span style=\"text-decoration: underline;\">m<\/span><\/em>ulti-line&#8217; mode)<\/h2>\n\n\n\n<p>Starting with (?m) flag tells anchors <code>^<\/code> (begin with) and <code>$<\/code> (end with) to <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Assertions: use lookarounds to skip (not capture) patterns <br><code>(?( = | &lt;= | ! | &lt;! ) assertion pattern)<\/code><\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li><code>&lt;<\/code> is lookbehind, no prefix-character is lookahead. <br><code>-ahead<\/code>\/<code>-behind<\/code> refers to WHERE the you want TO CAPTURE relative to the assertion pattern, <br>NOT what you want to <em><strong>assert<\/strong><\/em> (match and throw) away (inside the <code>(? ...)<\/code> )<\/li><li><code>=<\/code> (positive) asserts the pattern inside the lookaround bracket,<br>! (negative) asserts the pattern inside the lookaround bracket MUST BE FALSE.<\/li><\/ul>\n\n\n\n<p><em>Assertions<\/em> are very useful for getting to the meat you really want to capture rather than sifting through patterns introduced solely for making assertions that you intended to throw away<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/regexland.com\/all-between-specified-characters\/\" target=\"_blank\" rel=\"noreferrer noopener\">Extract HTML block<\/a><\/h2>\n\n\n\n<p><code>(?ms)(?&lt;=  starting tag pattern)  body pattern (?= terminating tag pattern)<\/code><\/p>\n<div class=\"pvc_clear\"><\/div><p id=\"pvc_stats_3936\" class=\"pvc_stats all  \" data-element-id=\"3936\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img loading=\"lazy\" decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/wonghoi.humgar.com\/blog\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p><div class=\"pvc_clear\"><\/div>","protected":false},"excerpt":{"rendered":"<p>Concepts Mechanics . any character \\ escapes special characters characters (\\d digits,\\w word (i.e. letter\/digit\/underscore), \\s whitespace). [] character classes (define rules over what characters are accepted, unlike the . wildcard)[3-7] hypen inside [] bracket can specify ranges to mean &hellip; <a href=\"https:\/\/wonghoi.humgar.com\/blog\/2022\/03\/28\/regex-notes\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n<div class=\"pvc_clear\"><\/div>\n<p id=\"pvc_stats_3936\" class=\"pvc_stats all  \" data-element-id=\"3936\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img loading=\"lazy\" decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/wonghoi.humgar.com\/blog\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p>\n<div class=\"pvc_clear\"><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[71],"tags":[],"class_list":["post-3936","post","type-post","status-publish","format-standard","hentry","category-regex"],"_links":{"self":[{"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/posts\/3936","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/comments?post=3936"}],"version-history":[{"count":13,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/posts\/3936\/revisions"}],"predecessor-version":[{"id":3953,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/posts\/3936\/revisions\/3953"}],"wp:attachment":[{"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/media?parent=3936"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/categories?post=3936"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wonghoi.humgar.com\/blog\/wp-json\/wp\/v2\/tags?post=3936"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}