A Human-Friendly Data Serialization Format
This directory contains formal grammar definitions for the UP (Unified Properties) language in multiple parser generator formats.
UP is a line-oriented, hierarchical data format with these key features:
key!type valuekey { ... }key [item1, item2] or multiline... `# comment textBest for: C/C++ parsers, mature tooling, LALR(1) parsing
# Generate parser
bison -d up.y
flex up.l
gcc up.tab.c lex.yy.c -lfl -o up-parser
# Test
echo "name Alice" | ./up-parser
Files:
Features:
Best for: Cross-language parsers, LL(*) parsing, rich tooling
# Generate parser (Java example)
antlr4 up.g4
javac up*.java
# For Python
antlr4 -Dlanguage=Python3 up.g4
# For JavaScript
antlr4 -Dlanguage=JavaScript up.g4
Target Languages: Java, Python, JavaScript, C#, C++, Go, Swift, PHP, Dart
File: up.g4 - Combined grammar (lexer + parser in one file)
Features:
Best for: Simple integration, unambiguous parsing, packrat parsing
# Using PEG.js
pegjs up.peg
node
> const parser = require('./up.js');
> parser.parse('name Alice');
# Using Peggy (modern PEG.js)
npx peggy up.peg
File: up.peg - Parsing Expression Grammar
Features:
Compatible with: PEG.js, Peggy, python-peg, pest (Rust), and other PEG parsers
Best for: Editor integration, syntax highlighting, incremental parsing
# Install tree-sitter CLI
npm install -g tree-sitter-cli
# Generate parser
tree-sitter generate
# Test
tree-sitter parse ../examples/01-basic-scalars.up
# Create syntax highlighting queries
tree-sitter highlight ../examples/01-basic-scalars.up
File: grammar.js - Tree-sitter grammar (JavaScript DSL)
Features:
Integration: Used for UP syntax highlighting in editors and on GitHub
All grammars define these core concepts:
IDENTIFIER [A-Za-z_][A-Za-z0-9_-]*
INTEGER [0-9]+
BANG !
LBRACE {
RBRACE }
LBRACKET [
RBRACKET ]
BACKTICKS ```
HASH #
NEWLINE \n | \r\n
Document ::= Statement*
Statement ::= Key TypeAnnotation? Value? Newline
| Comment
Key ::= IDENTIFIER
TypeAnnotation ::= BANG TypeName
Value ::= Scalar | Block | List | Multiline | Table
Block ::= LBRACE Statement* RBRACE
List ::= LBRACKET (Value (COMMA Value)*)? RBRACKET
Multiline ::= BACKTICKS Content BACKTICKS
Comment ::= HASH RestOfLine
Statements are separated by newlines, not semicolons:
name Alice # Statement 1
age!int 30 # Statement 2
active!bool true # Statement 3
Some constructs require state tracking:
All identifiers are valid keys:
# These are all valid
if true
for loop
class Person
return value
Each grammar format has specific testing commands:
# Bison/Yacc - Check for conflicts
bison --warnings=all up.y
# ANTLR4 - Validate grammar
antlr4 -Werror up.g4
# PEG.js - Test with trace
pegjs --trace up.peg
# Tree-sitter - Run tests
tree-sitter test
tree-sitter parse ../examples/*.up
| Format | Best For | Pros | Cons |
|---|---|---|---|
| Bison+Flex | C/C++ integration | Mature, fast, standard | C-specific, setup complexity |
| ANTLR4 | Multi-language support | 10+ targets, great tooling | Larger runtime, Java-based |
| PEG | JavaScript/simple parsers | Unambiguous, easy to read | No left recursion, backtracking |
| Tree-sitter | Editor integration | Incremental, error recovery | Specific use case |
When implementing a parser using these grammars:
Handle these special cases:
```, exit on closing ```#, include in AST for doc generation{, [, `)Key implementation points:
!type after key, store with node[a, b] and multiline formats!N removes N spaces from each line of multiline valueRecommended AST nodes:
Document
├─ Node
│ ├─ key: string
│ ├─ type: string?
│ └─ value: Value
└─ ...
Value = Scalar | Block | List | Table | Multiline
Provide helpful error messages:
Parse error at line 15: Expected '}' to close block
Block opened at line 10: server {
Got: unexpected EOF
Test all grammars against example files:
../examples/01-basic-scalars.up # Simple key-value pairs
../examples/02-blocks.up # Nested blocks
../examples/03-lists.up # List structures
../examples/04-multiline.up # Multiline strings
../examples/07-tables.up # Table format
../examples/08-mixed-complex.up # Complex nested structures
When modifying grammars:
../go/, ../java/, etc. stay compatibleAll grammar definitions are licensed under GNU GPLv3 - see ../LICENSE for details.