Mudanças entre as edições de "Regular expression"

De WikiLICC
Ir para: navegação, pesquisa
m (Padrões)
m (Padrões)
Linha 6: Linha 6:
 
Classes de caracteres e padrões
 
Classes de caracteres e padrões
  
;[abc]: a, b ou c
+
;[abc] :a, b ou c
;[a-c]: a, b ou c
+
;[a-c] :a, b ou c
;[0123456789]: qualquer dígito
+
;[0123456789] :qualquer dígito
;[0-9]:       qualquer dígito
+
;[0-9] :qualquer dígito
;[^abc]:       qualquer caractere exceto a ,b ou c
+
;[^abc] :qualquer caractere exceto a ,b ou c
 +
;\a    :caractere BELL (BEL, 0x07)
 +
;\f    :caractere form feed (FF, 0x0C)
 +
;\n    :caractere fim de linha (LF, 0x0A)
 +
;\r    :caractere carriage return (CR, 0x0D)
 +
;\t    :caractere TAB (HT, 0x09)
 +
;\v    :caractere TAB vertical (VT, 0x0B)
 +
;\xhhhh :caractere Unicode hhhh
 +
;. (dot):qualquer caractere (inclui newline)
 +
;\d    :qualquer dígito [0-9]
 +
;\D    :qualquer não-dígito [^0-9] ou [^\d]
 +
;\s    :espaco em branco. Igual a [ \t\n\r]
 +
;\S    :exceto espaco em branco. Igual a [^ \t\r\n] e  [^\s]
 +
;\w    :word caractere: dígito ou letra. Igual a [a-zA-Z0-9]
 +
;\W    :exceto word caractere
  
  
\a
 
  
    This matches the ASCII bell character (BEL, 0x07).
+
Devem ser "escaped"
\f
 
  
    This matches the ASCII form feed character (FF, 0x0C).
+
;]    :Finaliza uma classe
\n
+
;^ (caret) :Nega uma classe
 +
;- (dash) :Denotes um range
 +
;\ (backslash) :usado para "escape"
  
    This matches the ASCII line feed character (LF, 0x0A, Unix newline).
+
Encontre "somente um dos"
\r
 
  
    This matches the ASCII carriage return character (CR, 0x0D).
+
;[a|b|1|2] :apenas a ou b ou 1 ou 2
\t
 
  
    This matches the ASCII horizontal tab character (HT, 0x09).
 
\v
 
  
    This matches the ASCII vertical tab character (VT, 0x0B).
+
Padrões de substituicão (entre parenteses)
\xhhhh
 
  
    This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo (i.e., \zero ooo) matches the ASCII/Latin-1 character corresponding to the octal number ooo (between 0 and 0377).
+
;(int|float|double)\s+\w+ ;Somente um dos int, float ou double seguida por espaco e algumas letras.
. (dot)
 
  
    This matches any character (including newline).
+
Referencias anteriores
\d
 
  
    This matches a digit. Equal to [0-9]
+
;(\w+),\1 : encontras duas palavras repetidas separada por vírgula. Note que \1 repere o padrão
\D
 
  
    This matches a non-digit. Equal to [^0-9] or [^\d]
+
Olhando a frente
\s
 
 
 
    This matches a whitespace character. Practically equal to [ \t\n\r]
 
\S
 
 
 
    This matches a non-whitespace. Practically equal to [^ \t\r\n], and equal to [^\s]
 
\w
 
 
 
    Matches any “word character” - in this case any letter or digit. Note that underscore (_) is not matched, as is the case with perl regular expressions. Equal to [a-zA-Z0-9]
 
\W
 
 
 
    Matches any non-word character - anything but letters or numbers. Equal to [^a-zA-Z0-9] or [^\w]
 
 
 
The abbreviated classes can be put inside a custom class, for example to match a word character, a blank or a dot, you could write [\w \.]
 
Note
 
 
 
The POSIX notation of classes, [:<class name>:] is currently not supported.
 
Characters with special meanings inside character classes
 
 
 
The following characters has a special meaning inside the “[]” character class construct, and must be escaped to be literally included in a class:
 
 
 
]
 
 
 
    Ends the character class. Must be escaped unless it is the very first character in the class (may follow an unescaped caret)
 
^ (caret)
 
 
 
    Denotes a negative class, if it is the first character. Must be escaped to match literally if it is the first character in the class.
 
- (dash)
 
 
 
    Denotes a logical range. Must always be escaped within a character class.
 
\ (backslash)
 
 
 
    The escape character. Must always be escaped.
 
 
 
Alternatives: matching “one of”
 
 
 
If you want to match one of a set of alternative patterns, you can separate those with | (vertical bar character).
 
 
 
For example to find either “John” or “Harry” you would use an expression John|Harry.
 
Sub Patterns
 
 
 
Sub patterns are patterns enclosed in parentheses, and they have several uses in the world of regular expressions.
 
Specifying alternatives
 
 
 
You may use a sub pattern to group a set of alternatives within a larger pattern. The alternatives are separated by the character “|” (vertical bar).
 
 
 
For example to match either of the words “int”, “float” or “double”, you could use the pattern int|float|double. If you only want to find one if it is followed by some whitespace and then some letters, put the alternatives inside a subpattern: (int|float|double)\s+\w+.
 
Capturing matching text (back references)
 
 
 
If you want to use a back reference, use a sub pattern to have the desired part of the pattern remembered.
 
 
 
For example, if you want to find two occurrences of the same word separated by a comma and possibly some whitespace, you could write (\w+),\s*\1. The sub pattern \w+ would find a chunk of word characters, and the entire expression would match if those were followed by a comma, 0 or more whitespace and then an equal chunk of word characters. (The string \1 references the first sub pattern enclosed in parentheses)
 
Lookahead Assertions
 
  
 
A lookahead assertion is a sub pattern, starting with either ?= or ?!.
 
A lookahead assertion is a sub pattern, starting with either ?= or ?!.
  
For example to match the literal string “Bill” but only if not followed by “ Gates”, you could use this expression: Bill(?! Gates). (This would find “Bill Clinton” as well as “Billy the kid”, but silently ignore the other matches.)
+
;Bill(?! Gates) :Bill Clinton, Billy the kid, mas não Bill Gates
 
 
Sub patterns used for assertions are not captured.
 
 
 
See also Assertions
 
Characters with a special meaning inside patterns
 
 
 
The following characters have meaning inside a pattern, and must be escaped if you want to literally match them:
 
 
 
\ (backslash)
 
 
 
    The escape character.
 
^ (caret)
 
 
 
    Asserts the beginning of the string.
 
$
 
 
 
    Asserts the end of string.
 
() (left and right parentheses)
 
 
 
    Denotes sub patterns.
 
{} (left and right curly braces)
 
 
 
    Denotes numeric quantifiers.
 
[] (left and right square brackets)
 
 
 
    Denotes character classes.
 
| (vertical bar)
 
 
 
    logical OR. Separates alternatives.
 
+ (plus sign)
 
 
 
    Quantifier, 1 or more.
 
* (asterisk)
 
 
 
    Quantifier, 0 or more.
 
? (question mark)
 
  
    An optional character. Can be interpreted as a quantifier, 0 or 1.
 
  
Prev Contents Next
+
Caracteres especiais em padrões de substituićão
Regular Expressions Regular Expressions Quantifiers
+
;\ (backslash) :Escape
Would you like to make a comment or contribute an update to this page?
+
;^ (caret) :Comeco de string
Send feedback to the KDE Docs Team
+
;$    :fim de string
 +
;() :denota padrões de substituićào
 +
;{} :quantificadores numéricos
 +
;[] :delimita classes
 +
;|  :ou lógico
 +
;+  :quantificador 1 ou mais
 +
;*  :quantificador 0 ou mais
 +
;? :quantificador 0 ou 1

Edição das 13h10min de 24 de maio de 2015

Breve resumo de expressões regulares para o Kate.

Veja www.kate-editor.org/doc/regular-expressions.html

Padrões

Classes de caracteres e padrões

[abc] 
a, b ou c
[a-c] 
a, b ou c
[0123456789] 
qualquer dígito
[0-9]  
qualquer dígito
[^abc] 
qualquer caractere exceto a ,b ou c
\a  
caractere BELL (BEL, 0x07)
\f  
caractere form feed (FF, 0x0C)
\n  
caractere fim de linha (LF, 0x0A)
\r  
caractere carriage return (CR, 0x0D)
\t  
caractere TAB (HT, 0x09)
\v  
caractere TAB vertical (VT, 0x0B)
\xhhhh 
caractere Unicode hhhh
. (dot)
qualquer caractere (inclui newline)
\d  
qualquer dígito [0-9]
\D  
qualquer não-dígito [^0-9] ou [^\d]
\s  
espaco em branco. Igual a [ \t\n\r]
\S  
exceto espaco em branco. Igual a [^ \t\r\n] e [^\s]
\w  
word caractere: dígito ou letra. Igual a [a-zA-Z0-9]
\W  
exceto word caractere


Devem ser "escaped"

]  
Finaliza uma classe
^ (caret) 
Nega uma classe
- (dash)  
Denotes um range
\ (backslash) 
usado para "escape"

Encontre "somente um dos"

[a|b|1|2] 
apenas a ou b ou 1 ou 2


Padrões de substituicão (entre parenteses)

(int|float|double)\s+\w+ ;Somente um dos int, float ou double seguida por espaco e algumas letras.

Referencias anteriores

(\w+),\1 
encontras duas palavras repetidas separada por vírgula. Note que \1 repere o padrão

Olhando a frente

A lookahead assertion is a sub pattern, starting with either ?= or ?!.

Bill(?! Gates) 
Bill Clinton, Billy the kid, mas não Bill Gates


Caracteres especiais em padrões de substituićão

\ (backslash) 
Escape
^ (caret)  
Comeco de string
$  
fim de string
() 
denota padrões de substituićào
{} 
quantificadores numéricos
[] 
delimita classes
|  
ou lógico
+  
quantificador 1 ou mais
 
  • quantificador 0 ou mais
?  
quantificador 0 ou 1