|
Regexps and Parsing in FPGAs |
|||
|
Home Life in an FPGA >> << FPGAs as coprocessors
Usenet Postings |
Newsgroups: comp.arch.fpga
Subject: regular expression matching and parsing in FPGAs (was chess...)
Date: Fri, 24 Dec 1999 12:30:30 -0800
I'm not sure how regular expression matching and parsing relates to chess,
but these can of course be done at tremendous speeds in an FPGA. Here are
some ideas.
REGULAR EXPRESSIONS
A regular expression (including compositions of simpler REs) can be
implemented in hardware through these transformations:
1. convert the RE to an NFA (non-deterministic finite-state automaton)
2. convert the NFA to a DFA (deterministic FA) using the subset construction
3. implement the DFA as a one-hot FSM (finite state machine).
Here each state is a flip-flop. Transitions between states are
combinatorial expressions of the states and the input. If the input is
encoded, e.g. as 8-bit characters, there would probably be additional logic
to decode characters ('.') or character classes ([0123456789]).
This could consume one character per clock at 100-200 MHz.
LR PARSING
A parser for a context-free grammar can be implemented in hardware through
these transformations.
1. derive LR-parser shift-reduce tables (actions, goto, etc.) from the CFG
productions.
2. implement the shift-reduce parsing algorithm approximately as:
// types
enum Token { term1, ..., nTerminals, nonterm1, ..., nTNT};
enum State { s0, s1, ..., nStates };
enum Prods { ..., nProds };
enum Action { accept, error, shift, reduce1, reduce2, ... };
// tables
Action actions[nStates][nTNT];
State goto[nStates][nTNT];
Token lhs[nProds]; // token on lhs of production, e.g. X for X ::= Y1 ... Yn
int rhs_len[nProds]; // length of rhs of production, e.g. n for above
// shift-reduce parser
State stack[]; // a stack of states
int depth= 0; // index of top-of-stack
State s = s0;
stack[depth] = s;
Token input = first token;
repeat {
Action a = actions[s][input];
if (a == accept)
stop, accept;
else if (a == error)
stop, error;
else if (a == shift) {
s = stack[++depth] = goto[s][input];
input = next token;
}
else if (a == reduce p) {
s = stack[depth -= rhs_len[p]];
s = stack[++depth] = goto[s][lhs[p]];
}
}
3. implement *that* in an FPGA:
For example, for parsing C, from off the top of my head,
nStates < 256,
nTNT < 128,
nProds < 100,
Stack depth <128.
So implement 'input' in a 7-bit-wide FIFO; 'stack' in a 128x8 SRAM; 'lhs' in
a 100x7 SRAM; 'rhs_len' in a 100x4 SRAM, and 'actions' and 'goto' in
external SSRAM (for C, they won't fit in a few block rams). 'Stack' is
indexed by depth; 'actions' by s and input; 'goto' by s and mux(input,
lhs[p]). Implement depth in an accumulator (a 7-bit register + an adder of
+1 or - prod_size[p]).
Perhaps you could clock this parser at 100 MHz. A pity that it is not easy
to pipeline the action and goto lookups.
So here's a sketch of an FCCM to syntax check a preprocessed C program.
Characters come in at 200 MHz to the lexical analyzer, which uses the RE
recognizer described above to deposit tokens into a FIFO. The C parser
drains the FIFO, consuming a token approximately every few cycles at 100
MHz. Overall the machine gobbles C code at 200 MB/s, or about 5 million
lines per second, and then either the green Accept LED or the red Error LED
lights up. :-)
(Interesting gedanken experiment but I'd rather do it in software, thanks.)
(I would be surprised if this (shift-reduce parsing in hardware) has not
been studied already. Also, I wonder if there are alternative
implementations of the LR parser algorithm that can better use 1-hot
encodings. The problem is the current state gets pushed and popped, and a
stack of 1-hot states is not very storage efficient; thus you'd have to
encode them into a binary representation and decode them back to a 1-hot
encoding.)
(I suppose there might be real-world applications of work like this in areas
like text classification and DNA sequencing.)
Ref: see e.g. Compilers: Principles, Techniques, and Tools, Aho Sethi and
Ullman, or Crafting a Compiler, Fischer and LeBlanc.
Jan Gray
Copyright © 2000, Gray Research LLC. All rights reserved. |