BULK

Trade-off

uniformity

as good as Lisp

generality

as good as Lisp

extensibility

decentralization

only decentralized binary format? on par with plain text formats using decentralized IDs (like URI based on decentralized DNS)

compactness

better than all general binary formats because they lack abstraction, worse than any ad hoc format (like ASN1)

processing speed

extremely simple to parse

easy to evaluate
possible to forbid evaluation for extremely restricted environments (embedded)

Syntax

marker	shape	notes
00	nil
01	(
02	)
03	# Int {content}
04–0F	reserved
10–17	BULK
18–7F	refs
80–BF	w6[value]	value = (marker && 3F)
C0–FF	#[size] {content}	size = (marker && 3F)

BULK core namespace

NSID: 0x20

0x00

( version Int Int )

0x01

true

0x02

false

0x03

( stringenc {enc}:Encoding )

0x04

( iana-charset Nat ):Encoding

0x05

( codepage Nat ):Encoding

0x06

( ns {marker}:Ref {id}:UniqueID )

0x07

( package {id}:UniqueID {namespaces} )

0x08

( import {base}:Int {count}:Int {uuid} )

0x09

( define Ref Expr )

0x0A

( mnemonic/def {name}:Expr {mnemonic}:String {doc}:Expr {value} )

0x0B

( ns-mnemonic {ns}:Expr {mnemonic}:String {doc} )

0x0C

( verifiable-ns {marker}:Int {id}:UniqueID {mnemonic}:Expr {doc}:Expr {definitions} )

( mnemonic/def nil {…} ) defines the next unattributed ref (starting at 0)

0x10

( concat {a1} {a2} )

0x11

( subst {code} )

0x12

( arg Int )

0x13

( rest Int )

0x20: ( unsigned-int Bytes )
0x21: ( signed-int Bytes )
0x22: ( fraction Int Int )
0x23: ( binary-float Bytes )
0x24: ( decimal-float Bytes )
0x25: ( binary-fixed Nat Bytes )
0x26: ( decimal-fixed Nat Bytes )
0x27: decimal2

0x30

( prefix-bytecode {bytecode} )

0x31

( prefix-bytecode* ( {arities} ) {bytecode} )

{arities} is a sequence of: ( Int {refs} )

0x32

( postfix-bytecode {bytecode} )

0x33

( postfix-bytecode* ( {arities} ) {bytecode} )

0x34

( arity Int {refs} )

BULK API Model

needed at least for hashing: access the bytes of an expression (API to get positions to read the bytes independently or API to get the bytes directly)

BARK: BULK Archive

When needed, metadata can be any expression (nil SHOULD be used to indicate no metadata).

When reading expressions as entries, ignore nil and process description forms. Entries and descriptions both are numbered (not nil).

0x00

( bark {content} )

can also be used as a reference after imports, to signal that the stream is a BARK stream

0x01

( entry {gbc-tag}:Expr {metadata}:Expr {content}:Expr )

{content} can be an array (e.g. a file’s content) or BULK expression

0x03

( description {metadata}:Expr )

can be inserted in many places in a BULK stream to annotate virtually anything

0x04

( metadata {data} )

0x05

( count {num} )

0x06

( about {what} )

{what} is a sequence of expressions, each identifying the entry

0x25

( entry {num} )

0x26

( previous {skip} )

within a metadata form, designates the expression before that metadata form (possibly after skipping {skip} expressions)

0x27

( next {skip} )

within a metadata form, designates the expression after that metadata form (possibly after skipping {skip} expressions)

0x28

everything-before

within a metadata form, designates the whole sequence of expressions before that metadata form

0x29

( before {marker}:Ref {skip} )

within a metadata form, designates the expression in the outer context of the metadata form that is before the occurrence of {marker} (possibly after skipping {skip} expressions)
- undefined if multiple occurrences

0x2A

( after {marker}:Ref {skip} )

like before, but after…

0x10

gbc|

opaque GBC form

0x11

gbc>

GBC form must not be preserved if payload is modified

0x12

gbc*>

preservable GBC form

0x13

gbc*~>

preservable GBC form whose payload was modified

0x03

( bulk-stream {stream} )

0x04

( bulk-bytes {bulk}:Bytes )

-

0x30: ( compressed gbc| {method}:MeTOD Bytes )
0x31: deflate
0x32: deflate64
0x33: lzma
0x34: lzma2
Ox35: bz2
0x36: lzw
0x37: lzo
0x3D: ( hash {signature}:Expr )
0x3E: ( hashed gbc> {signature}:Expr Expr )
0x3F: ( encrypted gbc| {method} Bytes )

tar semantics
- metadata
  
  Ox40
  ( path {components} )

by design, there is no way to express an absolute FS path
- an application is free to define insecure forms to express absolute paths and links
- TODO: what if a component contain “/”?
  - implementation should not resolve the name but look it up in the directory entries (that takes care of “/” but not of a “..” entry, this still needs checking, shame on Unix)
    
    0x41
    ( user {name} )
{name} can be anything, incl. string and Nat
- multiple entries (e.g. “pierre”/1000)
  
  0x42
  ( group {name} )
  
  0x43
  contiguous
  
  0x44
  ( access {time} )
  
  0x45
  ( modification {time} )
  
  0x46
  ( change {time} )
  
  0x47
  ( mode {mode} )
  
  0x48
  ( posix-acl {acls} )

0x49

( user {id} {mode} {default?} )

0x4A

( group {id} {mode} {defaults?} )

0x4B

( other {mode} {defaults?} )

0x4C

( mask {mode} {defaults?} )

0x4D: ( xattr {xattr} )

{xattrs} = ( {name} {value} )+

Ox4E
( offsets Nat+ )
meant for forms not containing individual entries’ metadata
TODO: base?

0x4F
( offset Nat )
meant for forms grouping an entry with its metadata
TODO: base?
- entry
  - an array as an entry (possibly within GBC forms) is presumed to be a regular file
  0x50
  ( hard-link Path )
  
  0x51
  ( sym-link Path )
  
  0x52
  ( char-dev {major}:Int {minor}:Int )
  
  0x53
  ( block-dev {major}:Int {minor}:Int )
  
  0x54
  directory
  
  0x55
  fifo
  
  0x56
  ( sparse-file {segments} )
Bytes

0x57

( hole {size}:Nat )

{size} in bytes

gzip semantics
0x60
( binary Boolean )

0x61
( comment Expr )

0x62
( os Expr )
- vocabularies may provide additional expressions for OSes
0x70
FAT file system (DOS, OS/2, NT) + PKZIPW 2.50 VFAT, NTFS

0x71
Amiga

0x72
VMS (VAX or Alpha AXP)

0x73
Unix

0x74
VM/CMS

0x75
Atari

0x76
HPFS file system (OS/2, NT 3.x)

0x77
Macintosh

0x78
Z-System

0x79
CP/M

0x7A
TOPS-20

0x7B
NTFS file system (NT)

0x7C
SMS/QDOS

0x7D
Acorn RISC OS

0x7E
VFAT file system (Win95, NT)

0x7F
MVS (code also taken for PRIMOS)

0x80
BeOS (BeBox or PowerMac)

0x81
Tandem/NSK

0x82
THEOS
0x63
maximum-compression

0x64
fastest-comœpression

0x83
( acorn-bbc-mos-file-type-info Bytes )

0x84
( apollo-file-type-info Bytes )

0x85
( cpio-compressed Bytes )
- gzsig extra field should be created from a compatible cryptographic signature
0x86
( keynote-assertion Bytes )

0x88
( macintosh-info Bytes )

0x89
( acorn-file-type-info Bytes )
dar semantics
- split archives
  - advertised in container metadata

0x90

( split-archive {archive-id} {member-id} {members} )

members are description forms that MAY contain filename or hash
- FS-specific attributes
- incremental backup?
- fast member extract? (how does DAR does that?)

One could define a whole namespace of compact versions, like

about-num ⇔ ( lambda n ( about ( entry n ) ) )
about-previous ⇔ ( about ( previous ) )
about-previous* ⇔ ( lambda n ( about ( previous n ) ) )
about-num[3] ⇔ ( about ( entry 3 ) )
about-previous[2] ⇔ ( about ( previous 2 ) )

BUlk possibly-Zipped archive (.buz)

A foobar.buz archive with multiple files:

( pack ( metadata ( count 2 ) )
  ( described gbc*> ( metadata ( path "foo.txt" ) )
    ( compressed gbc| lzma {foo.txt}:Bytes ) )
  ( described ( metadata ( path "bar.jpg" ) )
    {bar.jpg}:Bytes ) )

A foo.txt.buz archive with a single file and hash for integrity:

( described ( metadata ( path "foo.txt" ) )
  ( hashed gbc> ( sha3 {hash}:Bytes )
    ( compressed gbc| lzma {foo.txt} ) ) )

A manifest.buz manifest about other files:

( description ( metadata ( ( about ( path "foo.txt" ) )
                           ( hash ( md5 {hash}:Bytes ) ) ) ) )
( description ( metadata ( ( about ( path "bar.txt" ) )
                           ( hash ( md5 {hash}:Bytes ) ) ) ) )
( description ( metadata ( ( about ( path "baz.iso" ) )
                           ( hash ( sha3 {hash}:Bytes ) ) ) ) )

BARK utility

$ bark list <file>
$ bark check <file>
$ bark extract <file> [<members>]

convert

$ barf convert --to gzip <file>
$ barf convert --from dar <file>

This command convert from and to BULK. Converting to and then from BULK should produce a file at least semantically identical, (it may be bytewise identical, and it might be an implementation goal to achieve that, but no metadata is stored to that end by default).

mode of operation

lossless
refuse conversion if semantic information would be lost (i.e. if a string is not encodable in the target format, but not if random padding is present)

lossy
not lossless (i.e. a one-member tar archive converted to BULK might then be converted to gzip, at the price of losing ACLs)

transform
change data representation to fit target format (i.e. if target is gzip, LZMA data would be recompressed to deflate, a UTF-8 string encoded in ISO-8859-1)

maintain
refuse conversion if data representation in the source format doesn’t fit target format

should never need to refuse if BULK is target?
- default is lossless transform
Targets:
- manifests
  - SFV
- compression formats

BARK Object Model

access to metadata
- consolidated metadata when forms overwrite each other?

API for history?
- access to entries
  - across manifests/packs/stacks within a common context
- ability to add entries/metadata while not breaking hashes
  - when hash is recomputable:
app knows algo/has all data to hash (key, etc…)
modify/delete/append in place
rehash
- when hash is not recomputable:
app doesn’t know algo/lacks some data
modify/delete raise error
append after original data

Comparison

tar
- +compression
zip
XZ
- has a limited choice of compression/hash
gzip
cpio, pax

BULK stream with size

BULK cannot contain a form like ( bulk/size {size}:Nat {bulk} ) because size could be erroneous and then parsing the whole stream or skipping {bulk} would give two different results (or more if {bulk} contains other such erroneous forms.

This could represent a security risk, with some parsers not seeing an issue and others triggering it.

Lambda expressions

( verifiable-ns 24 {id} nil "λ"
"This vocabulary can be used to represent functions that can be evaluated."

( mnemonic/def nil "lambda" "( lambda {var}:Ref {body} )" )

( define 0x18FF "This reference is intended to be used as lambda function variable." )
( mnemonic/def nil "a" 0x18FF )
( mnemonic/def nil "b" 0x18FF )
( mnemonic/def nil "c" 0x18FF )
( mnemonic/def nil "d" 0x18FF )
( mnemonic/def nil "e" 0x18FF )
( mnemonic/def nil "f" 0x18FF )
( mnemonic/def nil "g" 0x18FF )
( mnemonic/def nil "h" 0x18FF )
( mnemonic/def nil "i" 0x18FF )
( mnemonic/def nil "j" 0x18FF )
( mnemonic/def nil "k" 0x18FF )
( mnemonic/def nil "l" 0x18FF )
( mnemonic/def nil "m" 0x18FF )
( mnemonic/def nil "n" 0x18FF )
( mnemonic/def nil "o" 0x18FF )
( mnemonic/def nil "p" 0x18FF )
( mnemonic/def nil "q" 0x18FF )
( mnemonic/def nil "r" 0x18FF )
( mnemonic/def nil "s" 0x18FF )
( mnemonic/def nil "t" 0x18FF )
( mnemonic/def nil "u" 0x18FF )
( mnemonic/def nil "v" 0x18FF )
( mnemonic/def nil "w" 0x18FF )
( mnemonic/def nil "x" 0x18FF )
( mnemonic/def nil "y" 0x18FF )
( mnemonic/def nil "z" 0x18FF )

( mnemonic/def nil "id" "Somestimes a form is needed just to add a semantic aspect to an expression without actually changing its value for most purposes. For these cases, a reference can be given the value of id. Some processing applications will substitute their own evaluation to this one to implement that semantic." ( lambda x x ) )
)

Useful vocabularies

RDF 1.0

TODO: update to RDF 1.1 or 1.2

0x01: uriref ⇔ λ:id
0x02: ( base Bytes )
0x03: prefix ⇔ ( uri( lambda u ( lambda s ( concat u s ) ) )
0x04: rdf# ⇔ ( uriref ”http://www.w3.org/1999/02/22-rdf-syntax-ns#” )
0x05: blank
0x06: ( plain {lang} {literal} )
0x07: ( datatype {id}:URIRef {literal} )
0x08: xmlliteral ⇔ ( rdf# “XMLLiteral” )
0x09: ( triples {triples} )
0x0A: ( turtle {statements} )
0x0B: type ⇔ ( rdf# “type” )
0x0C: property ⇔ ( rdf# “Property” )
0x0D: statement ⇔ ( rdf# “Statement” )
0x0E: subject ⇔ ( rdf# “subject” )
0x0F: predicate ⇔ ( rdf# “predicate” )
0x10: object ⇔ ( rdf# “object” )
0x11: bag ⇔ ( rdf# “Bag” )
0x12: seq ⇔ ( rdf# “Seq” )
0x13: alt ⇔ ( rdf# “Alt” )
0x14: value ⇔ ( rdf# “value” )
0x15: list ⇔ ( rdf# “List” )
0x16: nil ⇔ ( rdf# “nil” )
0x17: first ⇔ ( rdf# “first” )
0x18: rest ⇔ ( rdf# “rest” )
0x19: plainliteral ⇔ ( rdf# “PlainLiteral” )

-

0x20: this-resource
0x21: uri

Differences between complete triples (3s) and turtle-like (Tl)

In 3s, a single triple cannot cost less than 8 bytes:

(:A:B:C)

For big graphs of mostly known references, this can already be a valuable improvement. {triples} could be a packed sequence without markers around triples, but that would mean that a single missing or superfluous expression would wreck everything that’s after it. The fact that a triple is still a form limits the savings but keeps a level of robustness (but it would be possible to define a packing RDF form…).

Adding another triple cannot cost less than adding 8 bytes:

(:A:B:C)(:A:B:D)

In Tl, a standalone triple cannot cost less than 10 bytes:

(:A(:B:C))

But adding another triple can cost as few as 2 bytes:

(:A(:B:C:D))

XML

XML is pretty complex, but most of it is unused (some even advised not to be used, i.e. unparsed entity). The vocabulary can be split into loosely coupled parts:

document
DTD
schema
Relax NG

Document

XML content, not notation: no support for entities or CDATA. stringenc can be used everywhere.

( define ?rfc ( subst ( pi "rfc" ( rest 0 ) ) ) )

( verifiable-ns 0x1800 {id} nil "xml"
"This vocabulary can be used to represent XML data."

( mnemonic/def nil "xml1.0" "( xml1.0 {content} )"
( mnemonic/def nil "xml1.1" "( xml1.1 {content} )"
( mnemonic/def nil "pi" "( pi {target} {content} )"
( mnemonic/def nil "comment" "( comment {content} )"
( mnemonic/def nil "element" "( element {name} {content} )"
( mnemonic/def nil "attribute" "( attribute {name} {value} )"
( mnemonic/def nil "xml:" ( rdf:prefix "http://www.w3.org/XML/1998/namespace" )
( mnemonic/def nil "xmlns:" "xmlns:" ( rdf:prefix "http://www.w3.org/2000/xmlns/" )
( mnemonic/def nil "preserve" "preserve" ( attribute ( xml: "space" ) "preserve" ) )

Default package?

RDF + Simple XML ( + XPath )

XPath namespace

( verifiable-ns 0x1800 {id} nil "xpath"
"This vocabulary can be used to represent XPath expressions."

( mnemonic/def nil "xpath" "( xpath {steps} )" )
( mnemonic/def nil "union" "( union {exprs} )" )
( mnemonic/def nil "step" "( step {axis} {test} {preds} )" )
( mnemonic/def nil "ancestor" nil )
( mnemonic/def nil "ancestor-or-self" nil )
( mnemonic/def nil "attribute" nil )
( mnemonic/def nil "child" nil )
( mnemonic/def nil "descendant" nil )
( mnemonic/def nil "descendant-or-self" nil )
( mnemonic/def nil "following" nil )
( mnemonic/def nil "following-sibling" nil )
( mnemonic/def nil "namespace" nil )
( mnemonic/def nil "parent" nil )
( mnemonic/def nil "preceding" nil )
( mnemonic/def nil "preceding-sibling" nil )
( mnemonic/def nil "self" nil )
( mnemonic/def nil "node()" nil )
( mnemonic/def nil "text()" nil )
( mnemonic/def nil "comment()" nil )
( mnemonic/def nil "pi()" "pi() or ( pi() {name}:String )" )
( mnemonic/def nil "pi()" nil )

( mnemonic/def nil "." "" ( step self node() ) )
( mnemonic/def nil ".." "" ( step parent node() ) )
( mnemonic/def nil "//" "" ( step descendant-or-self node() ) )

( mnemonic/def nil "step*" "" ( λ:lambda λ:a ( step λ:a node() ) ) )


)

As a Step, {name}:QName ⇔ ( step child {name} ) ?

QName

To maximize reuse between namespaces, URIRef and URIString expressions also have the type QName. Any Bytes whose content satisfy the NCName production also has.

MeTOD: Media Type Optimal Description

type as ref or form
atomic type
- html5
- jpeg
composite type
- syntax: ( main-type {params} )
- example: xml
  - ( xml xhtml rdf )

meaning xml with xhtml document root and rdf elements
- xhtml* = ( subst ( xml xhtml ( rest 0 ) ) )
( xhtml* mathml svg )
- some MeTOD types may only make sense as sub-types
  - e.g. xml NS that doesn’t have a document element
    - like dublin-core: ( xml xhtml svg dublin-core )
- encoding as first-class type
  - ( gzip tar )
  - ( base64 zip )
- complex structures?
  - ( mime ( alternatives ( qp ( text utf-8 ) ) ( qp html5 ) ) ( base64 zip ) ( signature ( base64 openpgp ) ) )
- accept patterns
  - ( xml * )
  - ( xml xhtml * )
- semantics dictated by type
  - for xml, the first subtype MUST be the type for the document element
  - for MIME, order of subtypes is order of parts

0x00

( type {type}:Expr )

metadata form

0x02

*

0x03

bulk / ( bulk {namespaces} )

0x10

( text {encoding} {subtype} )

means that it can be shown as-is to a user
( text {encoding} ) means plain text
( text utf-8 markdown )
( text utf-8 ( source-code c++ ) )

0x11

( source {langs} )

only as subtype of text, means that if available, a code editor should be used to view this format

0x12

( image {subtype} )

0x13

( audio {subtype} )

0x14

( video {subtype} )

MeTOD only defines kinds where a default software could be expected to process many or most types of this kind. This is not the case for MIME registries application, text, message, model, multipart and text. But a MIME vocabulary could define them.

Dates namespace

Nat123 := Nat | Nat Nat | Nat Nat Nat
NatsF := Nat* ( Float | Nat )
Time = Date | TimeOfDay

0x00

( calendar Nat123 )

0x01

( weekdate Nat123 )

0x02

( ordinal Nat Nat )

0x03

( time NatsF )

0x04

( point Date TimeOfDay )

0x05

( zulu Time )

0x06

( offset TimeOfDay Time )

0x07

( years NatsF )

0x08

( months NatsF )

0x09

( days NatsF )

0x0A

( hours NatsF )

0x0B

( minutes NatsF )

0x0C

( seconds NatsF )

0x0D

( weeks Nat )

0x0E

( interval {exprs} )

{exprs} = Time Time | Duration Time | Time Duration | Duration

0x0F

( repeat Nat Interval ) / ( repeat Interval )

???

( julian Number )

???

( anno-mundi Nat123 )

???

( anno-hegirae Nat123 )

???

( unix SInt )

???

( ntp Word )

???

( tai64 Word64 )

???

( tng-stardate Nat Nat )

Hash

   ( verifiable-ns 0x1800 {id} nil "hash"
   "The forms in this vocabulary can be used to represent hashes along with the hashing algorithm instead of using an unmarked byte sequence. When an algorithm has other inputs than the message, they can be provided after the hash itself as a property list.

When an algorithm can produce hashes in different sizes and the size used is a number of bits divisible by 8, the size property should be omitted from the property list and inferred by the processing application from the size of the BULK expression (e.g. `( sha3 #[24] {hash}:24B )` is a 196-bits SHA3 hash).

As a rule, each of these forms can contain `nil` as a first expression to denote not a hash but a choice of configuration in some application context. For example, `( uuid nil prepend {ns}:Bytes )` could mean that subsequent v3 and v5 UUIDs will be produced with {ns} as UUID namespace."

   ( mnemonic/def nil "bsd" "( bsd Bytes )" )
   ( mnemonic/def nil "sysv" "( sysv Bytes )" )
   ( mnemonic/def nil "crc" "( crc Bytes )" )
   ( mnemonic/def nil "fletcher" "( fletcher Bytes {config} )" )
   ( mnemonic/def nil "adler32" "( adler32 Bytes )" ( λ:lambda λ:h ( fletcher λ:h key 65521 ) ) )
   ( mnemonic/def nil "pjwhash" "( pjwhash Bytes )" )
   ( mnemonic/def nil "elfhash" "( elfhash Bytes )" )

   ( mnemonic/def nil "murmur1" "( murmur1 Bytes )" )
   ( mnemonic/def nil "murmur2" "( murmur2 Bytes )" )
   ( mnemonic/def nil "murmur2a" "( murmur2a Bytes )" )
   ( mnemonic/def nil "murmur64a" "( murmur64a Bytes )" )
   ( mnemonic/def nil "murmur64b" "( murmur64b Bytes )" )
   ( mnemonic/def nil "murmur3" "( murmur3 Bytes )" )

   ( mnemonic/def nil "umac" "( umac Bytes {config} )" )
   ( mnemonic/def nil "vmac" "( vmac Bytes {config} )" )

   ( mnemonic/def nil "uuid" "( uuid Bytes {config} )" )
   ( mnemonic/def nil "md2" "( md2 Bytes )" )
   ( mnemonic/def nil "md4" "( md4 Bytes )" )
   ( mnemonic/def nil "md5" "( md5 Bytes )" )
   ( mnemonic/def nil "md6" "( md6 Bytes {config} )" )
   ( mnemonic/def nil "ripemd" "( ripemd Bytes )" )
   ( mnemonic/def nil "haval" "( haval Bytes )" )
   ( mnemonic/def nil "gost" "( gost Bytes )" )
   ( mnemonic/def nil "sha1" "( sha1 Bytes )" )
   ( mnemonic/def nil "sha2" "( sha2 Bytes )" )
   ( mnemonic/def nil "sha3" "( sha3 Bytes )" )
   ( mnemonic/def nil "tiger" "( tiger Bytes )" )
   ( mnemonic/def nil "tiger2" "( tiger2 Bytes )" )
   ( mnemonic/def nil "whirlpool" "( whirlpool Bytes )" )
   ( mnemonic/def nil "blake" "( blake Bytes )" )
   ( mnemonic/def nil "blake2" "( blake2 Bytes )" )

   ( mnemonic/def nil "size" )
   ( mnemonic/def nil "prepend" )
   ( mnemonic/def nil "append" )
   ( mnemonic/def nil "key" )
   ( mnemonic/def nil "salt" )
   ( mnemonic/def nil "rounds" )

   )

Encryption

blowfish?
camellia?
twofish?
AES?
serpent?
openpgp?

Asking input

test https://github.com/eishay/jvm-serializers?

Implementation notes

Semantics beyond definitions

When implementing a processing application that gives semantics beyond the evaluation of expressions, to benefit from all possible evaluations, the application should just replace relevant prior definitions with its own implementation while evaluating the BULK streams.

Bootstrapping a hashing vocabulary

the problem is that this vocabulary provides hashes before any way of expressing a hash is possible, so its own hash is expressed with a name inside the vocabulary
you read ( ns 0x1800 ( 0x182E #[8] {hashID}:8B ) )
how do you get to the point where you know 0x182E is hash:sha3?
- you get the list (hopefully with only one element) of vocabularies identified by a form whose sole element is a 64-bits word {hashid}
- for each of them, you check if 0x2E is a name associated with a hashing algorithm
  - if yes, you check if that hash matches the definition

Minimal BULK

( version 1 0 ) ( ns 24 ( 24:sha3 #[8] 8B ) ) ( ns 25 ( 24:sha3 #[8] 8B ) ) 25:type #[0–63] {content}
|<---- 6 ---->| |<----------- 18 ---------->| |<----------- 18 ---------->| |<---- 3 ---->|

( version 1 0 ) ( ns 24 ( 24:sha3 #[8] 8B ) ) ( ns 25 ( 24:sha3 #[8] 8B ) ) 25:type # {size} {content}
|<---- 6 ---->| |<----------- 18 ---------->| |<----------- 18 ---------->| |<- 3 ->|  2/3/5/9

When a profile is known (like a specific file extension for typed blobs):

( version 1 0 ) 24:type #…
|<---- 8 ---->| |<- 3 ->|

content	BULK overhead	with profile
63B	45B	11B
255B	47B	13B
65kB	48B	14B
4GB	50B	16B
18EB	54B	20B

The power of abstraction

Explicit relationship between similar data

When a protocol makes it possible to express several different data that are related but different in structure, most other formats can only express those relationships in human readable documentation.

Example: an event format that includes information about the person doing the action and the person logging it:

{ "eventType": "WallPainted",
  "eventDate": "20210723T235601Z",
  "wallId": "d9e62839-dafe-49e1-b4d2-ce99c035fa9f",
  "logged": { "by": "alice", "date": "20210723T090145Z" },
  "painted": { "by": "bob", "date": "20210722T193545Z" }
}

When the date for the action or logging is the same as the event, it might be omitted:

{ "eventType": "WallPainted",
  "eventDate": "20210723T235601Z",
  "wallId": "d9e62839-dafe-49e1-b4d2-ce99c035fa9f",
  "logged": { "by": "alice" },
  "painted": { "by": "bob" }
}

And the fomat might include a shortcut for those cases:

{ "eventType": "WallPainted",
  "eventDate": "20210723T235601Z",
  "wallId": "d9e62839-dafe-49e1-b4d2-ce99c035fa9f",
  "loggedBy": "alice",
  "paintedBy": "bob"
}

With almost all existing formats, there is not way to convey the relationship between loggedBy and logged.

But in BULK, loggedBy can be explicitly defined in terms of logged:

( verifiable-ns 0x1800 {id} nil "wall"
"This vocabulary can be used to represent wall painting events."

( mnemonic/def nil "event" "( event {properties} )"
( mnemonic/def nil "type" "( type Expr )"
( mnemonic/def nil "date" "( date Expr )"
( mnemonic/def nil "wallId" "( comment {content} )"
( mnemonic/def nil "logged" "( logged {properties} )"
( mnemonic/def nil "painted" "( painted {properties} )"
( mnemonic/def nil "by" "( by {person} )"
( mnemonic/def nil "wallPaintedEvent" "( wallPaintedEvent {properties} )" ( subst ( event ( type "WallPainted" ) ( rest 0 ) ) )
( mnemonic/def nil "loggedBy" "( loggedBy {person} )" ( subst ( logged ( by ( arg 0 ) ) ) )
( mnemonic/def nil "paintedBy" "( paintedBy {person} )" ( subst ( painted ( by ( arg 0 ) ) ) )

Which means that the application processing this format doesn’t even need to know about loggedBy and paintedBy because BULK evaluation will transform them away:

( wallPaintedEvent
  ( date "20210723T235601Z" )
  ( loggedBy "alice" )
  ( paintedBy "bob" ) )

will get evaluated into:

( event
  ( type "WallPainted" )
  ( date "20210723T235601Z" )
  ( logged
    ( by "alice" ) )
  ( painted
    ( by "bob" ) ) )

Forward compatibility

BULK makes it possible to create new versions of vocabularies that encompass previous versions, in a way that minimizes implementation complexity. Whenever a new version of a vocabulary is used, an application can use an alternative definition for the previous version that maps it to the new version.

Example: let’s define an extremely limited BARK with just manifests and SHA-3:

( verifiable-ns 0x1800 {id} nil "bark alpha"
"This vocabulary can be used to represent manifests."

( mnemonic/def nil "description" "( event {properties} )"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notes.org

notes.org

BULK

Trade-off

Syntax

BULK core namespace

BULK API Model

BARK: BULK Archive

BUlk possibly-Zipped archive (.buz)

BARK utility

convert

BARK Object Model

Comparison

BULK stream with size

Lambda expressions

Useful vocabularies

RDF 1.0

Differences between complete triples (3s) and turtle-like (Tl)

XML

Document

Default package?

XPath namespace

QName

MeTOD: Media Type Optimal Description

Dates namespace

Hash

Encryption

Asking input

Implementation notes

Semantics beyond definitions

Bootstrapping a hashing vocabulary

Minimal BULK

The power of abstraction

Explicit relationship between similar data

Forward compatibility

Files

notes.org

Latest commit

History

notes.org

File metadata and controls

BULK

Trade-off

Syntax

BULK core namespace

BULK API Model

BARK: BULK Archive

BUlk possibly-Zipped archive (.buz)

BARK utility

convert

BARK Object Model

Comparison

BULK stream with size

Lambda expressions

Useful vocabularies

RDF 1.0

Differences between complete triples (3s) and turtle-like (Tl)

XML

Document

Default package?

XPath namespace

QName

MeTOD: Media Type Optimal Description

Dates namespace

Hash

Encryption

Asking input

Implementation notes

Semantics beyond definitions

Bootstrapping a hashing vocabulary

Minimal BULK

The power of abstraction

Explicit relationship between similar data

Forward compatibility