- uniformity
- as good as Lisp
- generality
- as good as Lisp
- extensibility
- decentralization
- only decentralized binary format? on par with plain text formats using decentralized IDs (like URI based on decentralized DNS)
- compactness
- better than all general binary formats because they lack abstraction, worse than any ad hoc format (like ASN1)
- processing speed
- extremely simple to parse
- easy to evaluate
- possible to forbid evaluation for extremely restricted environments (embedded)
marker | shape | notes |
---|---|---|
00 | nil | |
01 | ( | |
02 | ) | |
03 | # Int {content} | |
04–0F | reserved | |
10–17 | BULK | |
18–7F | refs | |
80–BF | w6[value] | value = (marker && 3F) |
C0–FF | #[size] {content} | size = (marker && 3F) |
NSID: 0x20
- 0x00
- ( version Int Int )
- 0x01
- true
- 0x02
- false
- 0x03
- ( stringenc {enc}:Encoding )
- 0x04
- ( iana-charset Nat ):Encoding
- 0x05
- ( codepage Nat ):Encoding
- 0x06
- ( ns {marker}:Ref {id}:UniqueID )
- 0x07
- ( package {id}:UniqueID {namespaces} )
- 0x08
- ( import {base}:Int {count}:Int {uuid} )
- 0x09
- ( define Ref Expr )
- 0x0A
- ( mnemonic/def {name}:Expr {mnemonic}:String {doc}:Expr {value} )
- 0x0B
- ( ns-mnemonic {ns}:Expr {mnemonic}:String {doc} )
- 0x0C
- ( verifiable-ns {marker}:Int {id}:UniqueID
{mnemonic}:Expr {doc}:Expr {definitions} )
- ( mnemonic/def nil {…} ) defines the next unattributed ref (starting at 0)
- 0x10
- ( concat {a1} {a2} )
- 0x11
- ( subst {code} )
- 0x12
- ( arg Int )
- 0x13
- ( rest Int )
- 0x20
- ( unsigned-int Bytes )
- 0x21
- ( signed-int Bytes )
- 0x22
- ( fraction Int Int )
- 0x23
- ( binary-float Bytes )
- 0x24
- ( decimal-float Bytes )
- 0x25
- ( binary-fixed Nat Bytes )
- 0x26
- ( decimal-fixed Nat Bytes )
- 0x27
- decimal2
- 0x30
- ( prefix-bytecode {bytecode} )
- 0x31
- ( prefix-bytecode* ( {arities} ) {bytecode} )
- {arities} is a sequence of: ( Int {refs} )
- 0x32
- ( postfix-bytecode {bytecode} )
- 0x33
- ( postfix-bytecode* ( {arities} ) {bytecode} )
- 0x34
- ( arity Int {refs} )
- needed at least for hashing: access the bytes of an expression (API to get positions to read the bytes independently or API to get the bytes directly)
When needed, metadata can be any expression (nil
SHOULD be used to indicate no metadata).
When reading expressions as entries, ignore nil
and process description
forms. Entries and
descriptions both are numbered (not nil
).
- 0x00
- ( bark {content} )
- can also be used as a reference after imports, to signal that the stream is a BARK stream
- 0x01
- ( entry {gbc-tag}:Expr {metadata}:Expr {content}:Expr )
- {content} can be an array (e.g. a file’s content) or BULK expression
- 0x03
- ( description {metadata}:Expr )
- can be inserted in many places in a BULK stream to annotate virtually anything
- 0x04
- ( metadata {data} )
- 0x05
- ( count {num} )
- 0x06
- ( about {what} )
- {what} is a sequence of expressions, each identifying the entry
- 0x25
- ( entry {num} )
- 0x26
- ( previous {skip} )
- within a metadata form, designates the expression before that metadata form (possibly after skipping {skip} expressions)
- 0x27
- ( next {skip} )
- within a metadata form, designates the expression after that metadata form (possibly after skipping {skip} expressions)
- 0x28
- everything-before
- within a metadata form, designates the whole sequence of expressions before that metadata form
- 0x29
- ( before {marker}:Ref {skip} )
- within a metadata form, designates the expression in the outer
context of the metadata form that is before the occurrence of
{marker} (possibly after skipping {skip} expressions)
- undefined if multiple occurrences
- within a metadata form, designates the expression in the outer
context of the metadata form that is before the occurrence of
{marker} (possibly after skipping {skip} expressions)
- 0x2A
- ( after {marker}:Ref {skip} )
- like before, but after…
- 0x10
- gbc|
- opaque GBC form
- 0x11
- gbc>
- GBC form must not be preserved if payload is modified
- 0x12
- gbc*>
- preservable GBC form
- 0x13
- gbc*~>
- preservable GBC form whose payload was modified
- 0x03
- ( bulk-stream {stream} )
- 0x04
- ( bulk-bytes {bulk}:Bytes )
-
- 0x30
- ( compressed gbc| {method}:MeTOD Bytes )
- 0x31
- deflate
- 0x32
- deflate64
- 0x33
- lzma
- 0x34
- lzma2
- Ox35
- bz2
- 0x36
- lzw
- 0x37
- lzo
- 0x3D
- ( hash {signature}:Expr )
- 0x3E
- ( hashed gbc> {signature}:Expr Expr )
- 0x3F
- ( encrypted gbc| {method} Bytes )
- tar semantics
- metadata
- Ox40
- ( path {components} )
- metadata
- by design, there is no way to express an absolute FS path
- an application is free to define insecure forms to express absolute paths and links
- TODO: what if a component contain “/”?
- implementation should not resolve the name but look it up
in the directory entries (that takes care of “/” but not
of a “..” entry, this still needs checking, shame on Unix)
- 0x41
- ( user {name} )
- implementation should not resolve the name but look it up
in the directory entries (that takes care of “/” but not
of a “..” entry, this still needs checking, shame on Unix)
- {name} can be anything, incl. string and Nat
- multiple entries (e.g. “pierre”/1000)
- 0x42
- ( group {name} )
- 0x43
- contiguous
- 0x44
- ( access {time} )
- 0x45
- ( modification {time} )
- 0x46
- ( change {time} )
- 0x47
- ( mode {mode} )
- 0x48
- ( posix-acl {acls} )
- multiple entries (e.g. “pierre”/1000)
- 0x49
- ( user {id} {mode} {default?} )
- 0x4A
- ( group {id} {mode} {defaults?} )
- 0x4B
- ( other {mode} {defaults?} )
- 0x4C
- ( mask {mode} {defaults?} )
- 0x4D
- ( xattr {xattr} )
- {xattrs} = ( {name} {value} )+
- Ox4E
- ( offsets Nat+ )
- meant for forms not containing individual entries’ metadata
- TODO: base?
- 0x4F
- ( offset Nat )
- meant for forms grouping an entry with its metadata
- TODO: base?
- entry
- an array as an entry (possibly within GBC forms) is presumed to be a regular file
- 0x50
- ( hard-link Path )
- 0x51
- ( sym-link Path )
- 0x52
- ( char-dev {major}:Int {minor}:Int )
- 0x53
- ( block-dev {major}:Int {minor}:Int )
- 0x54
- directory
- 0x55
- fifo
- 0x56
- ( sparse-file {segments} )
- entry
- Bytes
- 0x57
- ( hole {size}:Nat )
- {size} in bytes
- gzip semantics
- 0x60
- ( binary Boolean )
- 0x61
- ( comment Expr )
- 0x62
- ( os Expr )
- vocabularies may provide additional expressions for OSes
- 0x70
- FAT file system (DOS, OS/2, NT) + PKZIPW 2.50 VFAT, NTFS
- 0x71
- Amiga
- 0x72
- VMS (VAX or Alpha AXP)
- 0x73
- Unix
- 0x74
- VM/CMS
- 0x75
- Atari
- 0x76
- HPFS file system (OS/2, NT 3.x)
- 0x77
- Macintosh
- 0x78
- Z-System
- 0x79
- CP/M
- 0x7A
- TOPS-20
- 0x7B
- NTFS file system (NT)
- 0x7C
- SMS/QDOS
- 0x7D
- Acorn RISC OS
- 0x7E
- VFAT file system (Win95, NT)
- 0x7F
- MVS (code also taken for PRIMOS)
- 0x80
- BeOS (BeBox or PowerMac)
- 0x81
- Tandem/NSK
- 0x82
- THEOS
- 0x63
- maximum-compression
- 0x64
- fastest-comœpression
- 0x83
- ( acorn-bbc-mos-file-type-info Bytes )
- 0x84
- ( apollo-file-type-info Bytes )
- 0x85
- ( cpio-compressed Bytes )
- gzsig extra field should be created from a compatible cryptographic signature
- 0x86
- ( keynote-assertion Bytes )
- 0x88
- ( macintosh-info Bytes )
- 0x89
- ( acorn-file-type-info Bytes )
- dar semantics
- split archives
- advertised in container metadata
- split archives
- 0x90
- ( split-archive {archive-id} {member-id} {members} )
- members are description forms that MAY contain filename
or hash
- FS-specific attributes
- incremental backup?
- fast member extract? (how does DAR does that?)
- members are description forms that MAY contain filename
or hash
One could define a whole namespace of compact versions, like
about-num ⇔ ( lambda n ( about ( entry n ) ) ) about-previous ⇔ ( about ( previous ) ) about-previous* ⇔ ( lambda n ( about ( previous n ) ) ) about-num[3] ⇔ ( about ( entry 3 ) ) about-previous[2] ⇔ ( about ( previous 2 ) )
A foobar.buz
archive with multiple files:
( pack ( metadata ( count 2 ) )
( described gbc*> ( metadata ( path "foo.txt" ) )
( compressed gbc| lzma {foo.txt}:Bytes ) )
( described ( metadata ( path "bar.jpg" ) )
{bar.jpg}:Bytes ) )
A foo.txt.buz
archive with a single file and hash for integrity:
( described ( metadata ( path "foo.txt" ) )
( hashed gbc> ( sha3 {hash}:Bytes )
( compressed gbc| lzma {foo.txt} ) ) )
A manifest.buz
manifest about other files:
( description ( metadata ( ( about ( path "foo.txt" ) )
( hash ( md5 {hash}:Bytes ) ) ) ) )
( description ( metadata ( ( about ( path "bar.txt" ) )
( hash ( md5 {hash}:Bytes ) ) ) ) )
( description ( metadata ( ( about ( path "baz.iso" ) )
( hash ( sha3 {hash}:Bytes ) ) ) ) )
$ bark list <file> $ bark check <file> $ bark extract <file> [<members>]
$ barf convert --to gzip <file> $ barf convert --from dar <file>
This command convert from and to BULK. Converting to and then from BULK should produce a file at least semantically identical, (it may be bytewise identical, and it might be an implementation goal to achieve that, but no metadata is stored to that end by default).
- mode of operation
- lossless
- refuse conversion if semantic information would be lost (i.e. if a string is not encodable in the target format, but not if random padding is present)
- lossy
- not lossless (i.e. a one-member tar archive converted to BULK might then be converted to gzip, at the price of losing ACLs)
- transform
- change data representation to fit target format (i.e. if target is gzip, LZMA data would be recompressed to deflate, a UTF-8 string encoded in ISO-8859-1)
- maintain
- refuse conversion if data representation in the source format doesn’t fit target format
- should never need to refuse if BULK is target?
- default is lossless transform
Targets:
- manifests
- SFV
- compression formats
- access to metadata
- consolidated metadata when forms overwrite each other?
- API for history?
- access to entries
- across manifests/packs/stacks within a common context
- ability to add entries/metadata while not breaking hashes
- when hash is recomputable:
- access to entries
- app knows algo/has all data to hash (key, etc…)
- modify/delete/append in place
- rehash
- when hash is not recomputable:
- app doesn’t know algo/lacks some data
- modify/delete raise error
- append after original data
- tar
- +compression
- zip
- XZ
- has a limited choice of compression/hash
- gzip
- cpio, pax
BULK cannot contain a form like ( bulk/size {size}:Nat {bulk} )
because size could be
erroneous and then parsing the whole stream or skipping {bulk}
would give two different
results (or more if {bulk}
contains other such erroneous forms.
This could represent a security risk, with some parsers not seeing an issue and others triggering it.
( verifiable-ns 24 {id} nil "λ"
"This vocabulary can be used to represent functions that can be evaluated."
( mnemonic/def nil "lambda" "( lambda {var}:Ref {body} )" )
( define 0x18FF "This reference is intended to be used as lambda function variable." )
( mnemonic/def nil "a" 0x18FF )
( mnemonic/def nil "b" 0x18FF )
( mnemonic/def nil "c" 0x18FF )
( mnemonic/def nil "d" 0x18FF )
( mnemonic/def nil "e" 0x18FF )
( mnemonic/def nil "f" 0x18FF )
( mnemonic/def nil "g" 0x18FF )
( mnemonic/def nil "h" 0x18FF )
( mnemonic/def nil "i" 0x18FF )
( mnemonic/def nil "j" 0x18FF )
( mnemonic/def nil "k" 0x18FF )
( mnemonic/def nil "l" 0x18FF )
( mnemonic/def nil "m" 0x18FF )
( mnemonic/def nil "n" 0x18FF )
( mnemonic/def nil "o" 0x18FF )
( mnemonic/def nil "p" 0x18FF )
( mnemonic/def nil "q" 0x18FF )
( mnemonic/def nil "r" 0x18FF )
( mnemonic/def nil "s" 0x18FF )
( mnemonic/def nil "t" 0x18FF )
( mnemonic/def nil "u" 0x18FF )
( mnemonic/def nil "v" 0x18FF )
( mnemonic/def nil "w" 0x18FF )
( mnemonic/def nil "x" 0x18FF )
( mnemonic/def nil "y" 0x18FF )
( mnemonic/def nil "z" 0x18FF )
( mnemonic/def nil "id" "Somestimes a form is needed just to add a semantic aspect to an expression without actually changing its value for most purposes. For these cases, a reference can be given the value of id. Some processing applications will substitute their own evaluation to this one to implement that semantic." ( lambda x x ) )
)
TODO: update to RDF 1.1 or 1.2
- 0x01
- uriref ⇔ λ:id
- 0x02
- ( base Bytes )
- 0x03
- prefix ⇔ ( uri( lambda u ( lambda s ( concat u s ) ) )
- 0x04
- rdf# ⇔ ( uriref ”http://www.w3.org/1999/02/22-rdf-syntax-ns#” )
- 0x05
- blank
- 0x06
- ( plain {lang} {literal} )
- 0x07
- ( datatype {id}:URIRef {literal} )
- 0x08
- xmlliteral ⇔ ( rdf# “XMLLiteral” )
- 0x09
- ( triples {triples} )
- 0x0A
- ( turtle {statements} )
- 0x0B
- type ⇔ ( rdf# “type” )
- 0x0C
- property ⇔ ( rdf# “Property” )
- 0x0D
- statement ⇔ ( rdf# “Statement” )
- 0x0E
- subject ⇔ ( rdf# “subject” )
- 0x0F
- predicate ⇔ ( rdf# “predicate” )
- 0x10
- object ⇔ ( rdf# “object” )
- 0x11
- bag ⇔ ( rdf# “Bag” )
- 0x12
- seq ⇔ ( rdf# “Seq” )
- 0x13
- alt ⇔ ( rdf# “Alt” )
- 0x14
- value ⇔ ( rdf# “value” )
- 0x15
- list ⇔ ( rdf# “List” )
- 0x16
- nil ⇔ ( rdf# “nil” )
- 0x17
- first ⇔ ( rdf# “first” )
- 0x18
- rest ⇔ ( rdf# “rest” )
- 0x19
- plainliteral ⇔ ( rdf# “PlainLiteral” )
-
- 0x20
- this-resource
- 0x21
- uri
In 3s, a single triple cannot cost less than 8 bytes:
(:A:B:C)
For big graphs of mostly known references, this can already be a valuable improvement. {triples} could be a packed sequence without markers around triples, but that would mean that a single missing or superfluous expression would wreck everything that’s after it. The fact that a triple is still a form limits the savings but keeps a level of robustness (but it would be possible to define a packing RDF form…).
Adding another triple cannot cost less than adding 8 bytes:
(:A:B:C)(:A:B:D)
In Tl, a standalone triple cannot cost less than 10 bytes:
(:A(:B:C))
But adding another triple can cost as few as 2 bytes:
(:A(:B:C:D))
XML is pretty complex, but most of it is unused (some even advised not to be used, i.e. unparsed entity). The vocabulary can be split into loosely coupled parts:
- document
- DTD
- schema
- Relax NG
XML content, not notation: no support for entities or CDATA. stringenc
can be used
everywhere.
( define ?rfc ( subst ( pi "rfc" ( rest 0 ) ) ) )
( verifiable-ns 0x1800 {id} nil "xml"
"This vocabulary can be used to represent XML data."
( mnemonic/def nil "xml1.0" "( xml1.0 {content} )"
( mnemonic/def nil "xml1.1" "( xml1.1 {content} )"
( mnemonic/def nil "pi" "( pi {target} {content} )"
( mnemonic/def nil "comment" "( comment {content} )"
( mnemonic/def nil "element" "( element {name} {content} )"
( mnemonic/def nil "attribute" "( attribute {name} {value} )"
( mnemonic/def nil "xml:" ( rdf:prefix "http://www.w3.org/XML/1998/namespace" )
( mnemonic/def nil "xmlns:" "xmlns:" ( rdf:prefix "http://www.w3.org/2000/xmlns/" )
( mnemonic/def nil "preserve" "preserve" ( attribute ( xml: "space" ) "preserve" ) )
RDF + Simple XML ( + XPath )
( verifiable-ns 0x1800 {id} nil "xpath"
"This vocabulary can be used to represent XPath expressions."
( mnemonic/def nil "xpath" "( xpath {steps} )" )
( mnemonic/def nil "union" "( union {exprs} )" )
( mnemonic/def nil "step" "( step {axis} {test} {preds} )" )
( mnemonic/def nil "ancestor" nil )
( mnemonic/def nil "ancestor-or-self" nil )
( mnemonic/def nil "attribute" nil )
( mnemonic/def nil "child" nil )
( mnemonic/def nil "descendant" nil )
( mnemonic/def nil "descendant-or-self" nil )
( mnemonic/def nil "following" nil )
( mnemonic/def nil "following-sibling" nil )
( mnemonic/def nil "namespace" nil )
( mnemonic/def nil "parent" nil )
( mnemonic/def nil "preceding" nil )
( mnemonic/def nil "preceding-sibling" nil )
( mnemonic/def nil "self" nil )
( mnemonic/def nil "node()" nil )
( mnemonic/def nil "text()" nil )
( mnemonic/def nil "comment()" nil )
( mnemonic/def nil "pi()" "pi() or ( pi() {name}:String )" )
( mnemonic/def nil "pi()" nil )
( mnemonic/def nil "." "" ( step self node() ) )
( mnemonic/def nil ".." "" ( step parent node() ) )
( mnemonic/def nil "//" "" ( step descendant-or-self node() ) )
( mnemonic/def nil "step*" "" ( λ:lambda λ:a ( step λ:a node() ) ) )
)
As a Step, {name}:QName ⇔ ( step child {name} ) ?
To maximize reuse between namespaces, URIRef and URIString expressions also have the type QName. Any Bytes whose content satisfy the NCName production also has.
- type as ref or form
- atomic type
- html5
- jpeg
- composite type
- syntax: ( main-type {params} )
- example: xml
- ( xml xhtml rdf )
- meaning xml with xhtml document root and rdf elements
- xhtml* = ( subst ( xml xhtml ( rest 0 ) ) )
- ( xhtml* mathml svg )
- some MeTOD types may only make sense as sub-types
- e.g. xml NS that doesn’t have a document element
- like dublin-core: ( xml xhtml svg dublin-core )
- e.g. xml NS that doesn’t have a document element
- encoding as first-class type
- ( gzip tar )
- ( base64 zip )
- complex structures?
- ( mime ( alternatives ( qp ( text utf-8 ) ) ( qp html5 ) ) ( base64 zip ) ( signature ( base64 openpgp ) ) )
- accept patterns
- ( xml * )
- ( xml xhtml * )
- semantics dictated by type
- for xml, the first subtype MUST be the type for the document element
- for MIME, order of subtypes is order of parts
- some MeTOD types may only make sense as sub-types
- 0x00
- ( type {type}:Expr )
- metadata form
- 0x02
- *
- 0x03
- bulk / ( bulk {namespaces} )
- 0x10
- ( text {encoding} {subtype} )
- means that it can be shown as-is to a user
- ( text {encoding} ) means plain text
- ( text utf-8 markdown )
- ( text utf-8 ( source-code c++ ) )
- 0x11
- ( source {langs} )
- only as subtype of
text
, means that if available, a code editor should be used to view this format
- only as subtype of
- 0x12
- ( image {subtype} )
- 0x13
- ( audio {subtype} )
- 0x14
- ( video {subtype} )
MeTOD only defines kinds where a default software could be expected to process many or most types of this kind. This is not the case for MIME registries application, text, message, model, multipart and text. But a MIME vocabulary could define them.
- Nat123 := Nat | Nat Nat | Nat Nat Nat
- NatsF := Nat* ( Float | Nat )
- Time = Date | TimeOfDay
- 0x00
- ( calendar Nat123 )
- 0x01
- ( weekdate Nat123 )
- 0x02
- ( ordinal Nat Nat )
- 0x03
- ( time NatsF )
- 0x04
- ( point Date TimeOfDay )
- 0x05
- ( zulu Time )
- 0x06
- ( offset TimeOfDay Time )
- 0x07
- ( years NatsF )
- 0x08
- ( months NatsF )
- 0x09
- ( days NatsF )
- 0x0A
- ( hours NatsF )
- 0x0B
- ( minutes NatsF )
- 0x0C
- ( seconds NatsF )
- 0x0D
- ( weeks Nat )
- 0x0E
- ( interval {exprs} )
- {exprs} = Time Time | Duration Time | Time Duration | Duration
- 0x0F
- ( repeat Nat Interval ) / ( repeat Interval )
- ???
- ( julian Number )
- ???
- ( anno-mundi Nat123 )
- ???
- ( anno-hegirae Nat123 )
- ???
- ( unix SInt )
- ???
- ( ntp Word )
- ???
- ( tai64 Word64 )
- ???
- ( tng-stardate Nat Nat )
( verifiable-ns 0x1800 {id} nil "hash"
"The forms in this vocabulary can be used to represent hashes along with the hashing algorithm instead of using an unmarked byte sequence. When an algorithm has other inputs than the message, they can be provided after the hash itself as a property list.
When an algorithm can produce hashes in different sizes and the size used is a number of bits divisible by 8, the size property should be omitted from the property list and inferred by the processing application from the size of the BULK expression (e.g. `( sha3 #[24] {hash}:24B )` is a 196-bits SHA3 hash).
As a rule, each of these forms can contain `nil` as a first expression to denote not a hash but a choice of configuration in some application context. For example, `( uuid nil prepend {ns}:Bytes )` could mean that subsequent v3 and v5 UUIDs will be produced with {ns} as UUID namespace."
( mnemonic/def nil "bsd" "( bsd Bytes )" )
( mnemonic/def nil "sysv" "( sysv Bytes )" )
( mnemonic/def nil "crc" "( crc Bytes )" )
( mnemonic/def nil "fletcher" "( fletcher Bytes {config} )" )
( mnemonic/def nil "adler32" "( adler32 Bytes )" ( λ:lambda λ:h ( fletcher λ:h key 65521 ) ) )
( mnemonic/def nil "pjwhash" "( pjwhash Bytes )" )
( mnemonic/def nil "elfhash" "( elfhash Bytes )" )
( mnemonic/def nil "murmur1" "( murmur1 Bytes )" )
( mnemonic/def nil "murmur2" "( murmur2 Bytes )" )
( mnemonic/def nil "murmur2a" "( murmur2a Bytes )" )
( mnemonic/def nil "murmur64a" "( murmur64a Bytes )" )
( mnemonic/def nil "murmur64b" "( murmur64b Bytes )" )
( mnemonic/def nil "murmur3" "( murmur3 Bytes )" )
( mnemonic/def nil "umac" "( umac Bytes {config} )" )
( mnemonic/def nil "vmac" "( vmac Bytes {config} )" )
( mnemonic/def nil "uuid" "( uuid Bytes {config} )" )
( mnemonic/def nil "md2" "( md2 Bytes )" )
( mnemonic/def nil "md4" "( md4 Bytes )" )
( mnemonic/def nil "md5" "( md5 Bytes )" )
( mnemonic/def nil "md6" "( md6 Bytes {config} )" )
( mnemonic/def nil "ripemd" "( ripemd Bytes )" )
( mnemonic/def nil "haval" "( haval Bytes )" )
( mnemonic/def nil "gost" "( gost Bytes )" )
( mnemonic/def nil "sha1" "( sha1 Bytes )" )
( mnemonic/def nil "sha2" "( sha2 Bytes )" )
( mnemonic/def nil "sha3" "( sha3 Bytes )" )
( mnemonic/def nil "tiger" "( tiger Bytes )" )
( mnemonic/def nil "tiger2" "( tiger2 Bytes )" )
( mnemonic/def nil "whirlpool" "( whirlpool Bytes )" )
( mnemonic/def nil "blake" "( blake Bytes )" )
( mnemonic/def nil "blake2" "( blake2 Bytes )" )
( mnemonic/def nil "size" )
( mnemonic/def nil "prepend" )
( mnemonic/def nil "append" )
( mnemonic/def nil "key" )
( mnemonic/def nil "salt" )
( mnemonic/def nil "rounds" )
)
- blowfish?
- camellia?
- twofish?
- AES?
- serpent?
- openpgp?
When implementing a processing application that gives semantics beyond the evaluation of expressions, to benefit from all possible evaluations, the application should just replace relevant prior definitions with its own implementation while evaluating the BULK streams.
- the problem is that this vocabulary provides hashes before any way of expressing a hash is possible, so its own hash is expressed with a name inside the vocabulary
- you read
( ns 0x1800 ( 0x182E #[8] {hashID}:8B ) )
- how do you get to the point where you know
0x182E
ishash:sha3
?- you get the list (hopefully with only one element) of
vocabularies identified by a form whose sole element is a
64-bits word
{hashid}
- for each of them, you check if
0x2E
is a name associated with a hashing algorithm- if yes, you check if that hash matches the definition
- you get the list (hopefully with only one element) of
vocabularies identified by a form whose sole element is a
64-bits word
( version 1 0 ) ( ns 24 ( 24:sha3 #[8] 8B ) ) ( ns 25 ( 24:sha3 #[8] 8B ) ) 25:type #[0–63] {content} |<---- 6 ---->| |<----------- 18 ---------->| |<----------- 18 ---------->| |<---- 3 ---->|
( version 1 0 ) ( ns 24 ( 24:sha3 #[8] 8B ) ) ( ns 25 ( 24:sha3 #[8] 8B ) ) 25:type # {size} {content} |<---- 6 ---->| |<----------- 18 ---------->| |<----------- 18 ---------->| |<- 3 ->| 2/3/5/9
When a profile is known (like a specific file extension for typed blobs):
( version 1 0 ) 24:type #… |<---- 8 ---->| |<- 3 ->|
content | BULK overhead | with profile |
---|---|---|
63B | 45B | 11B |
255B | 47B | 13B |
65kB | 48B | 14B |
4GB | 50B | 16B |
18EB | 54B | 20B |
When a protocol makes it possible to express several different data that are related but different in structure, most other formats can only express those relationships in human readable documentation.
Example: an event format that includes information about the person doing the action and the person logging it:
{ "eventType": "WallPainted",
"eventDate": "20210723T235601Z",
"wallId": "d9e62839-dafe-49e1-b4d2-ce99c035fa9f",
"logged": { "by": "alice", "date": "20210723T090145Z" },
"painted": { "by": "bob", "date": "20210722T193545Z" }
}
When the date for the action or logging is the same as the event, it might be omitted:
{ "eventType": "WallPainted",
"eventDate": "20210723T235601Z",
"wallId": "d9e62839-dafe-49e1-b4d2-ce99c035fa9f",
"logged": { "by": "alice" },
"painted": { "by": "bob" }
}
And the fomat might include a shortcut for those cases:
{ "eventType": "WallPainted",
"eventDate": "20210723T235601Z",
"wallId": "d9e62839-dafe-49e1-b4d2-ce99c035fa9f",
"loggedBy": "alice",
"paintedBy": "bob"
}
With almost all existing formats, there is not way to convey the relationship between loggedBy
and logged
.
But in BULK, loggedBy
can be explicitly defined in terms of logged
:
( verifiable-ns 0x1800 {id} nil "wall"
"This vocabulary can be used to represent wall painting events."
( mnemonic/def nil "event" "( event {properties} )"
( mnemonic/def nil "type" "( type Expr )"
( mnemonic/def nil "date" "( date Expr )"
( mnemonic/def nil "wallId" "( comment {content} )"
( mnemonic/def nil "logged" "( logged {properties} )"
( mnemonic/def nil "painted" "( painted {properties} )"
( mnemonic/def nil "by" "( by {person} )"
( mnemonic/def nil "wallPaintedEvent" "( wallPaintedEvent {properties} )" ( subst ( event ( type "WallPainted" ) ( rest 0 ) ) )
( mnemonic/def nil "loggedBy" "( loggedBy {person} )" ( subst ( logged ( by ( arg 0 ) ) ) )
( mnemonic/def nil "paintedBy" "( paintedBy {person} )" ( subst ( painted ( by ( arg 0 ) ) ) )
Which means that the application processing this format doesn’t even need to know about
loggedBy
and paintedBy
because BULK evaluation will transform them away:
( wallPaintedEvent
( date "20210723T235601Z" )
( loggedBy "alice" )
( paintedBy "bob" ) )
will get evaluated into:
( event
( type "WallPainted" )
( date "20210723T235601Z" )
( logged
( by "alice" ) )
( painted
( by "bob" ) ) )
BULK makes it possible to create new versions of vocabularies that encompass previous versions, in a way that minimizes implementation complexity. Whenever a new version of a vocabulary is used, an application can use an alternative definition for the previous version that maps it to the new version.
Example: let’s define an extremely limited BARK with just manifests and SHA-3:
( verifiable-ns 0x1800 {id} nil "bark alpha"
"This vocabulary can be used to represent manifests."
( mnemonic/def nil "description" "( event {properties} )"