UP (Unified Properties)

A Human-Friendly Data Serialization Format

View the Project on GitHub uplang/spec

This directory contains formal grammar definitions for the UP (Unified Properties) language in multiple parser generator formats.

Overview

UP is a line-oriented, hierarchical data format with these key features:

Available Grammar Formats

1. Bison/Yacc + Flex (up.y + up.l)

Best for: C/C++ parsers, mature tooling, LALR(1) parsing

# Generate parser
bison -d up.y
flex up.l
gcc up.tab.c lex.yy.c -lfl -o up-parser

# Test
echo "name Alice" | ./up-parser

Files:

Features:

2. ANTLR4 (up.g4)

Best for: Cross-language parsers, LL(*) parsing, rich tooling

# Generate parser (Java example)
antlr4 up.g4
javac up*.java

# For Python
antlr4 -Dlanguage=Python3 up.g4

# For JavaScript
antlr4 -Dlanguage=JavaScript up.g4

Target Languages: Java, Python, JavaScript, C#, C++, Go, Swift, PHP, Dart

File: up.g4 - Combined grammar (lexer + parser in one file)

Features:

3. PEG (up.peg)

Best for: Simple integration, unambiguous parsing, packrat parsing

# Using PEG.js
pegjs up.peg
node
> const parser = require('./up.js');
> parser.parse('name Alice');

# Using Peggy (modern PEG.js)
npx peggy up.peg

File: up.peg - Parsing Expression Grammar

Features:

Compatible with: PEG.js, Peggy, python-peg, pest (Rust), and other PEG parsers

4. Tree-sitter (grammar.js)

Best for: Editor integration, syntax highlighting, incremental parsing

# Install tree-sitter CLI
npm install -g tree-sitter-cli

# Generate parser
tree-sitter generate

# Test
tree-sitter parse ../examples/01-basic-scalars.up

# Create syntax highlighting queries
tree-sitter highlight ../examples/01-basic-scalars.up

File: grammar.js - Tree-sitter grammar (JavaScript DSL)

Features:

Integration: Used for UP syntax highlighting in editors and on GitHub

Grammar Structure

All grammars define these core concepts:

Tokens (Lexer)

IDENTIFIER      [A-Za-z_][A-Za-z0-9_-]*
INTEGER         [0-9]+
BANG            !
LBRACE          {
RBRACE          }
LBRACKET        [
RBRACKET        ]
BACKTICKS       ```
HASH            #
NEWLINE         \n | \r\n

Syntax Rules (Parser)

Document      ::= Statement*
Statement     ::= Key TypeAnnotation? Value? Newline
                | Comment
Key           ::= IDENTIFIER
TypeAnnotation ::= BANG TypeName
Value         ::= Scalar | Block | List | Multiline | Table
Block         ::= LBRACE Statement* RBRACE
List          ::= LBRACKET (Value (COMMA Value)*)? RBRACKET
Multiline     ::= BACKTICKS Content BACKTICKS
Comment       ::= HASH RestOfLine

Key Grammar Characteristics

Line-Oriented

Statements are separated by newlines, not semicolons:

name Alice          # Statement 1
age!int 30          # Statement 2
active!bool true    # Statement 3

Context-Sensitive

Some constructs require state tracking:

Whitespace Rules

No Reserved Keywords

All identifiers are valid keys:

# These are all valid
if true
for loop
class Person
return value

Testing Grammars

Each grammar format has specific testing commands:

# Bison/Yacc - Check for conflicts
bison --warnings=all up.y

# ANTLR4 - Validate grammar
antlr4 -Werror up.g4

# PEG.js - Test with trace
pegjs --trace up.peg

# Tree-sitter - Run tests
tree-sitter test
tree-sitter parse ../examples/*.up

Choosing a Grammar Format

Format Best For Pros Cons
Bison+Flex C/C++ integration Mature, fast, standard C-specific, setup complexity
ANTLR4 Multi-language support 10+ targets, great tooling Larger runtime, Java-based
PEG JavaScript/simple parsers Unambiguous, easy to read No left recursion, backtracking
Tree-sitter Editor integration Incremental, error recovery Specific use case

Implementation Guidelines

When implementing a parser using these grammars:

1. Tokenization

Handle these special cases:

2. Parsing

Key implementation points:

3. AST Structure

Recommended AST nodes:

Document
  ├─ Node
  │   ├─ key: string
  │   ├─ type: string?
  │   └─ value: Value
  └─ ...

Value = Scalar | Block | List | Table | Multiline

4. Error Handling

Provide helpful error messages:

Parse error at line 15: Expected '}' to close block
  Block opened at line 10: server {
  Got: unexpected EOF

Examples

Test all grammars against example files:

../examples/01-basic-scalars.up     # Simple key-value pairs
../examples/02-blocks.up            # Nested blocks
../examples/03-lists.up             # List structures
../examples/04-multiline.up         # Multiline strings
../examples/07-tables.up            # Table format
../examples/08-mixed-complex.up     # Complex nested structures

Contributing

When modifying grammars:

  1. Update all formats - Keep grammars in sync across formats
  2. Test thoroughly - Use example files to validate changes
  3. Document changes - Update this README with new syntax
  4. Verify implementations - Ensure parser implementations in ../go/, ../java/, etc. stay compatible

References

Grammar Resources

License

All grammar definitions are licensed under GNU GPLv3 - see ../LICENSE for details.