kmy.regex.compiler
Class RMachine

java.lang.Object
  |
  +--kmy.regex.compiler.RMachine
Direct Known Subclasses:
RDebugMachine, RInterpMachine, RJavaClassMachine

public abstract class RMachine
extends java.lang.Object

This class represents "regex matching machine" that can be loaded with "RMachine instructions" and used later for regex matching. Subclasses of this class are used together with RCompiler (which compiles regex parsing tree into RMachine instructions). This class is used in the following fashion:

Single RMachine object is expected to be used only to compile a single regular expression.

RMachine can have a number of variables, and several character buffers (which a "pre-set" at runtime by a method external to RMachine (see Regex class documentation)). One of this buffers (main buffer) holds the string which this regex is being matched against. It also has an implicit character position that points into main buffer, as well as implicit instruction pointer that points to the RMachine instruction about to be executed. There is also a backtracing mechanism. When fork instruction is executed by the machine, it stores the value of the current character position and instruction pointer (that corresponds to the label given to the fork instruction) into special internal backtracing stack as a fork record. Also, for most modifications of RMachine variables (except for the hardAssign instruction), the original value (before assignment) is stored into backtracing stack. When fail instruction is executed, backtracing is performed. Information in backtracing stack is used to reverse assignments, reset character position and jump to the label given to the last fork instruction. Backtracing stack is cleared up to (and including) the last fork record. If not fork record is found in the stack, regex matching fails.


Field Summary
static int EXT_CONDJUMP
           
static int EXT_HINT
           
static int EXT_MULTIFORK
           
static int EXT_SHIFTTBL
           
private  int extensions
           
static int FLAG_DOT_IS_ANY
           
static int HINT_CHAR_STAR_HEAD
           
static int HINT_END_ANCHORED
           
static int HINT_START_ANCHORED
           
 
Constructor Summary
RMachine()
           
 
Method Summary
abstract  void assert(char[] constStr)
           
abstract  void assert(int charClass, char[] ranges)
          Make sure that current character belongs to the given character class.
abstract  void assert(java.lang.String varName, boolean picked)
           
abstract  void boundary(int boundaryClass)
          Check if current position is on the certain type of boundary given by.
 void condJump(char[] ranges, RLabel label)
          Jump if char is NOT in range
 void condJump(char c, RLabel label)
          Jump if char is NOT one that is given.
 void condJump(int atLeastCharLeft, int atMostCharLeft, RLabel label)
          Jump if less then atLeast or more then atMost chars left.
abstract  void decfail(RVariable var)
           
abstract  void decjump(RVariable var, RLabel label)
           
abstract  void fail()
           
 void finish()
           
abstract  void forget(RVariable var)
           
abstract  void fork(RLabel forkLabel)
          Add a fork record to backtracing stack.
 int getExtensions()
           
 int getNVars()
           
abstract  void hardAssign(RVariable v, int value)
           
 void hint(int flags, int minLength, int maxLength)
           
 void init()
           
abstract  void jump(RLabel label)
           
 Regex makeRegex()
           
abstract  void mark(RLabel label)
          Makes the given label to refer to the next RMachine instruction.
 void mfEnd(int maxCount)
          maxCount got minCount subtracted from it!
 void mfStart(int headDecrement, int minCount)
           
abstract  RLabel newLabel()
          Creates a new RMachine label.
abstract  RVariable newTmpVar(int init)
          Creates a new RMachine temporary variable.
abstract  RVariable newVar(java.lang.String name, boolean begin)
          Creates a new RMachine named variable.
abstract  void pick(RVariable v)
          Store current character position (in the main buffer) into a given variable.
 void setExtensions(int ext)
           
 void setNoRefiller(boolean norefiller)
           
 void shiftTable(boolean beginning, int charsAhead, char[] chars, int[] shifts)
           
abstract  void skip()
          Increment current position by 1 (skip a character).
 void tellName(java.lang.String name)
          Provides string representation of this regular expression.
 void tellPosition(int pos)
          Informs RMachine about current character position in the regex.
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

extensions

private int extensions

HINT_START_ANCHORED

public static final int HINT_START_ANCHORED

HINT_END_ANCHORED

public static final int HINT_END_ANCHORED

HINT_CHAR_STAR_HEAD

public static final int HINT_CHAR_STAR_HEAD

FLAG_DOT_IS_ANY

public static final int FLAG_DOT_IS_ANY

EXT_HINT

public static final int EXT_HINT

EXT_MULTIFORK

public static final int EXT_MULTIFORK

EXT_CONDJUMP

public static final int EXT_CONDJUMP

EXT_SHIFTTBL

public static final int EXT_SHIFTTBL
Constructor Detail

RMachine

public RMachine()
Method Detail

makeRegex

public Regex makeRegex()

getNVars

public int getNVars()

init

public void init()

finish

public void finish()

setNoRefiller

public void setNoRefiller(boolean norefiller)

tellName

public void tellName(java.lang.String name)
Provides string representation of this regular expression. It does not alter regex functionality. This string can be returned by resulting Regex toString() method, for example.

tellPosition

public void tellPosition(int pos)
Informs RMachine about current character position in the regex. It does not alter regex functionality. Can be used for debugging.

newVar

public abstract RVariable newVar(java.lang.String name,
                                 boolean begin)
Creates a new RMachine named variable. Such variable can be used to hold position in the string that regex is being matched against. Every variable name actually corresponds to a substring so two positions are needed: one for the beginning of substring and one for the end (points to the first character after substring). Parameter begin is used to tell which variable is needed.

newLabel

public abstract RLabel newLabel()
Creates a new RMachine label. Label can be used to mark RMachine instruction and can be jumped to.

newTmpVar

public abstract RVariable newTmpVar(int init)
Creates a new RMachine temporary variable. Such variable can be used to hold loop counter or any other integer. Adds an assignment-reversion record to backtracing stack.
Parameters:
init - initial value for the variable

mark

public abstract void mark(RLabel label)
Makes the given label to refer to the next RMachine instruction. Once a label is marked, it cannot be marked again.

pick

public abstract void pick(RVariable v)
Store current character position (in the main buffer) into a given variable. Adds an assignment-reversion record to backtracing stack.

fork

public abstract void fork(RLabel forkLabel)
Add a fork record to backtracing stack. If subsequent fail transfers control to this record, instruction pointer will be set to the given label.

skip

public abstract void skip()
Increment current position by 1 (skip a character).

boundary

public abstract void boundary(int boundaryClass)
Check if current position is on the certain type of boundary given by. boundaryClass. Boundary types:
  • '^' or 'A' - beginning of the string being matched.
  • '$' or 'Z' - end of the string being matched.
  • '<' - word beginning.
  • '>' - word end.
  • 'b' - word beginning or end.
  • 'B' - neither word beginning nor end.

assert

public abstract void assert(int charClass,
                            char[] ranges)
Make sure that current character belongs to the given character class. If it does, increment current char position by 1, otherwise fail. See kmy.regex.tree.CharSet and kmy.regex.tree.CharClassCodes.

assert

public abstract void assert(char[] constStr)

assert

public abstract void assert(java.lang.String varName,
                            boolean picked)

hardAssign

public abstract void hardAssign(RVariable v,
                                int value)

decjump

public abstract void decjump(RVariable var,
                             RLabel label)

decfail

public abstract void decfail(RVariable var)

forget

public abstract void forget(RVariable var)

jump

public abstract void jump(RLabel label)

fail

public abstract void fail()

getExtensions

public int getExtensions()

setExtensions

public void setExtensions(int ext)

hint

public void hint(int flags,
                 int minLength,
                 int maxLength)

mfStart

public void mfStart(int headDecrement,
                    int minCount)

mfEnd

public void mfEnd(int maxCount)
maxCount got minCount subtracted from it!

condJump

public void condJump(char[] ranges,
                     RLabel label)
Jump if char is NOT in range

condJump

public void condJump(int atLeastCharLeft,
                     int atMostCharLeft,
                     RLabel label)
Jump if less then atLeast or more then atMost chars left. If it is hard to determine how much left, it is OK not to jump.

condJump

public void condJump(char c,
                     RLabel label)
Jump if char is NOT one that is given.

shiftTable

public void shiftTable(boolean beginning,
                       int charsAhead,
                       char[] chars,
                       int[] shifts)