Mudanças entre as edições de "Regular expression"

De WikiLICC
Ir para: navegação, pesquisa
(Criou página com 'Breve resumo de expressões regulares para o Kate. Veja www.kate-editor.org/doc/regular-expressions.html == Padrões == Character Classes and abbreviations :[abc]; a, b ou ...')
(Sem diferença)

Edição das 12h43min de 24 de maio de 2015

Breve resumo de expressões regulares para o Kate.

Veja www.kate-editor.org/doc/regular-expressions.html

Padrões

Character Classes and abbreviations

[abc]; a, b ou c
[a-c]; a, b ou c
[0123456789]; qualquer dígito
[0-9]; qualquer dígito
[^abc]; qualquer caractere exceto a ,b ou c


\a

   This matches the ASCII bell character (BEL, 0x07).

\f

   This matches the ASCII form feed character (FF, 0x0C).

\n

   This matches the ASCII line feed character (LF, 0x0A, Unix newline).

\r

   This matches the ASCII carriage return character (CR, 0x0D).

\t

   This matches the ASCII horizontal tab character (HT, 0x09).

\v

   This matches the ASCII vertical tab character (VT, 0x0B).

\xhhhh

   This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo (i.e., \zero ooo) matches the ASCII/Latin-1 character corresponding to the octal number ooo (between 0 and 0377).

. (dot)

   This matches any character (including newline).

\d

   This matches a digit. Equal to [0-9]

\D

   This matches a non-digit. Equal to [^0-9] or [^\d]

\s

   This matches a whitespace character. Practically equal to [ \t\n\r]

\S

   This matches a non-whitespace. Practically equal to [^ \t\r\n], and equal to [^\s]

\w

   Matches any “word character” - in this case any letter or digit. Note that underscore (_) is not matched, as is the case with perl regular expressions. Equal to [a-zA-Z0-9]

\W

   Matches any non-word character - anything but letters or numbers. Equal to [^a-zA-Z0-9] or [^\w]

The abbreviated classes can be put inside a custom class, for example to match a word character, a blank or a dot, you could write [\w \.] Note

The POSIX notation of classes, [:<class name>:] is currently not supported. Characters with special meanings inside character classes

The following characters has a special meaning inside the “[]” character class construct, and must be escaped to be literally included in a class:

]

   Ends the character class. Must be escaped unless it is the very first character in the class (may follow an unescaped caret)

^ (caret)

   Denotes a negative class, if it is the first character. Must be escaped to match literally if it is the first character in the class.

- (dash)

   Denotes a logical range. Must always be escaped within a character class.

\ (backslash)

   The escape character. Must always be escaped.

Alternatives: matching “one of”

If you want to match one of a set of alternative patterns, you can separate those with | (vertical bar character).

For example to find either “John” or “Harry” you would use an expression John|Harry. Sub Patterns

Sub patterns are patterns enclosed in parentheses, and they have several uses in the world of regular expressions. Specifying alternatives

You may use a sub pattern to group a set of alternatives within a larger pattern. The alternatives are separated by the character “|” (vertical bar).

For example to match either of the words “int”, “float” or “double”, you could use the pattern int|float|double. If you only want to find one if it is followed by some whitespace and then some letters, put the alternatives inside a subpattern: (int|float|double)\s+\w+. Capturing matching text (back references)

If you want to use a back reference, use a sub pattern to have the desired part of the pattern remembered.

For example, if you want to find two occurrences of the same word separated by a comma and possibly some whitespace, you could write (\w+),\s*\1. The sub pattern \w+ would find a chunk of word characters, and the entire expression would match if those were followed by a comma, 0 or more whitespace and then an equal chunk of word characters. (The string \1 references the first sub pattern enclosed in parentheses) Lookahead Assertions

A lookahead assertion is a sub pattern, starting with either ?= or ?!.

For example to match the literal string “Bill” but only if not followed by “ Gates”, you could use this expression: Bill(?! Gates). (This would find “Bill Clinton” as well as “Billy the kid”, but silently ignore the other matches.)

Sub patterns used for assertions are not captured.

See also Assertions Characters with a special meaning inside patterns

The following characters have meaning inside a pattern, and must be escaped if you want to literally match them:

\ (backslash)

   The escape character.

^ (caret)

   Asserts the beginning of the string.

$

   Asserts the end of string.

() (left and right parentheses)

   Denotes sub patterns.

{} (left and right curly braces)

   Denotes numeric quantifiers.

[] (left and right square brackets)

   Denotes character classes.

| (vertical bar)

   logical OR. Separates alternatives.

+ (plus sign)

   Quantifier, 1 or more.
  • (asterisk)
   Quantifier, 0 or more.

? (question mark)

   An optional character. Can be interpreted as a quantifier, 0 or 1.

Prev Contents Next Regular Expressions Regular Expressions Quantifiers Would you like to make a comment or contribute an update to this page? Send feedback to the KDE Docs Team