You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
119 lines
3.1 KiB
Markdown
119 lines
3.1 KiB
Markdown
2 years ago
|
# fastparse
|
||
|
|
||
|
A very simple and stupid parser, based on a statemachine and regular expressions.
|
||
|
|
||
|
It's not intended for complex languages. It's intended to easily write a simple parser for a simple language.
|
||
|
|
||
|
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
Pass a description of statemachine to the constructor. The description must be in this form:
|
||
|
|
||
|
``` javascript
|
||
|
new Parser(description)
|
||
|
|
||
|
description is {
|
||
|
// The key is the name of the state
|
||
|
// The value is an object containing possible transitions
|
||
|
"state-name": {
|
||
|
// The key is a regular expression
|
||
|
// If the regular expression matches the transition is executed
|
||
|
// The value can be "true", a other state name or a function
|
||
|
|
||
|
"a": true,
|
||
|
// true will make the parser stay in the current state
|
||
|
|
||
|
"b": "other-state-name",
|
||
|
// a string will make the parser transit to a new state
|
||
|
|
||
|
"[cde]": function(match, index, matchLength) {
|
||
|
// "match" will be the matched string
|
||
|
// "index" will be the position in the complete string
|
||
|
// "matchLength" will be "match.length"
|
||
|
|
||
|
// "this" will be the "context" passed to the "parse" method"
|
||
|
|
||
|
// A new state name (string) can be returned
|
||
|
return "other-state-name";
|
||
|
},
|
||
|
|
||
|
"([0-9]+)(\\.[0-9]+)?": function(match, first, second, index, matchLength) {
|
||
|
// groups can be used in the regular expression
|
||
|
// they will match to arguments "first", "second"
|
||
|
},
|
||
|
|
||
|
// the parser stops when it cannot match the string anymore
|
||
|
|
||
|
// order of keys is the order in which regular expressions are matched
|
||
|
// if the javascript runtime preserves the order of keys in an object
|
||
|
// (this is not standardized, but it's a de-facto standard)
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The statemachine is compiled down to a single regular expression per state. So basically the parsing work is delegated to the (native) regular expression logic of the javascript runtime.
|
||
|
|
||
|
|
||
|
``` javascript
|
||
|
Parser.prototype.parse(initialState: String, parsedString: String, context: Object)
|
||
|
```
|
||
|
|
||
|
`initialState`: state where the parser starts to parse.
|
||
|
|
||
|
`parsedString`: the string which should be parsed.
|
||
|
|
||
|
`context`: an object which can be used to save state and results. Available as `this` in transition functions.
|
||
|
|
||
|
returns `context`
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
## Example
|
||
|
|
||
|
``` javascript
|
||
|
var Parser = require("fastparse");
|
||
|
|
||
|
// A simple parser that extracts @licence ... from comments in a JS file
|
||
|
var parser = new Parser({
|
||
|
// The "source" state
|
||
|
"source": {
|
||
|
// matches comment start
|
||
|
"/\\*": "comment",
|
||
|
"//": "linecomment",
|
||
|
|
||
|
// this would be necessary for a complex language like JS
|
||
|
// but omitted here for simplicity
|
||
|
// "\"": "string1",
|
||
|
// "\'": "string2",
|
||
|
// "\/": "regexp"
|
||
|
|
||
|
},
|
||
|
// The "comment" state
|
||
|
"comment": {
|
||
|
"\\*/": "source",
|
||
|
"@licen[cs]e\\s((?:[^*\n]|\\*+[^*/\n])*)": function(match, licenseText) {
|
||
|
this.licences.push(licenseText.trim());
|
||
|
}
|
||
|
},
|
||
|
// The "linecomment" state
|
||
|
"linecomment": {
|
||
|
"\n": "source",
|
||
|
"@licen[cs]e\\s(.*)": function(match, licenseText) {
|
||
|
this.licences.push(licenseText.trim());
|
||
|
}
|
||
|
}
|
||
|
});
|
||
|
|
||
|
var licences = parser.parse("source", sourceCode, { licences: [] }).licences;
|
||
|
|
||
|
console.log(licences);
|
||
|
```
|
||
|
|
||
|
|
||
|
|
||
|
## License
|
||
|
|
||
|
MIT (http://www.opensource.org/licenses/mit-license.php)
|