Atomic philology and parallel philology

some implications of CITE architecture

Neven Jovanović / neven.jovanovic@ffzg.hr
University of Zagreb
Heidelberg, 13 May 2017
This page: croala.ffzg.unizg.hr/atomic-parallel/
OSF repository: osf.io/dbf73/

Edition?

Atomic?

Parallel?

Actionable and interoperable?

Powers of ten!

Edition?

We do not design for books.

We design to make knowledge explicit.

We design to make explicit
knowledge about a text.

We design to make explicit
knowledge about complex cultural phenomena.

We design objects that will be manipulated.

"Manipulated" means:
reordered, relocated, reformatted, repurposed, improved, enhanced, criticized.

To manipulate anything, we should be able to reach it.

The unreachable is unusable.

For a digital object, to reach is to cite:
provide an address...
an address that is actionable.

osf.io/dbf73/

Ergo: to manipulate anything digitally, we should be able to cite it
(in an actionable way).

The uncitable is unusable.

A digital object is reachable (citable)
if it has an identifier.

osf.io/dbf73/

An identifier may be
an address, a number, a code.

Well, then...

How should an identifier for objects of an edition look like?

And what are the objects in an edition, anyway?

Catulli carmina, ed. Doering, 1820

An ordered hierarchy of content objects

Atomic?

The editor decides
(and is responsible for)
what to make citable in the text.

The editor is responsible for assigning (actionable) identifiers to the objects he or she wants to make citable.

For example...

The identifiers (URNs) can be just numbers and letters.

DOI: 10.17605/OSF.IO/DBF73
ISBN: 9780715617106
VIAF: 100218993

Or...

The URNs can have a meaningful structure, reflecting hierarchical levels of our edition.

CITE Architecture

Smith & Blackwell, 2001–

The editor distinguishes
(and makes citable and actionable)
hierarchical levels in an edition.

urn:cts:latinLit:phi0472.phi001.heidelberg-lat0.meter:div1.53.3

Parallel?

Overlapping hierarchies in a text

are a well known challenge.

Solutions:

multiple documents, milestones, joins, stand-off markup?

CITE architecture makes it easy to handle multiple documents
(concurrent versions).

The editor is responsible for
separating concerns:

he or she has to organize various categories of knowledge into separate documents.

Lectiones variae; corrections; metrical, grammatical, lexicographical commentary...

Catullus 53:

Merrill (complete); Merrill (simplified markup); Merrill, c. 53 (tokenized into words); Merrill, c. 53 scanned; Merrill, c. 53 (tokenized into sentence); Merrill, c. 53 (lemmatized); Merrill, c. 53 (analyzed morphologically); Doering, c. 53 (tokenized into words, with commentary); Marulić, c. 53 (tokenized into words, with commentary)

croala.ffzg.unizg.hr/basex/catullus/urnlist

Sources from: PerseusDL/canonical-latinLit (Github)

A new set of criteria on which to judge
a scholarly edition:

how well is its knowledge organized?

Actionable and interoperable?

A URN, published on the internet, becomes (digitally) reachable from the outside world.

Remember: in a digital world, to reach is to (be able to) manipulate.

This is what "actionable" means.

But we have to know what we'll find.

We have to understand a URN
to be able to program for it!

urn:cts:latinLit:phi0472.phi001.heidelberg-lat0.meter:div1.53.3

This is what "interoperable" means.

Powers of ten

Powers of Ten (Eames Office, 1977)

Cosmic View (Kees Boeke, 1957)

What will happen if CITE architecture is widely adopted?

What will happen if scholarly editions become actionable and interoperable from "the top" to "the atomic level"?

What will happen if scholarly editions become actionable and interoperable
in parallel?

My brain can manipulate two or five or ten ideas (in one go).

My computer can manipulate
tens or hundreds or thousands or millions of objects (if they have identifiers).

If we give our knowledge digital form, boundaries and names, computers can reach it and manipulate it.

Our knowledge becomes (digitally) actionable and interoperable.

My computer can manipulate
(combine, select, reorganize, repurpose...)
tens or hundreds or thousands or millions of (well identified) knowledge-objects.

We know how to interpret individual
(or a couple of) words, passages, texts, genres.

How will we interpret
thousands, millions of words, passages, texts, genres?

How will we check these interpretations?

Atomic philology and parallel philology

some implications of CITE architecture

Edition?

Atomic?

Parallel?

Actionable and interoperable?

Powers of ten!

Edition?

We do not design for books.

We design to make knowledge explicit.

We design to make explicit knowledge about a text.

We design to make explicit knowledge about complex cultural phenomena.

We design objects that will be manipulated.

"Manipulated" means: reordered, relocated, reformatted, repurposed, improved, enhanced, criticized.

To manipulate anything, we should be able to reach it.

The unreachable is unusable.

For a digital object, to reach is to cite: provide an address... an address that is actionable.

osf.io/dbf73/

Ergo: to manipulate anything digitally, we should be able to cite it (in an actionable way).

The uncitable is unusable.

A digital object is reachable (citable) if it has an identifier.

osf.io/dbf73/

An identifier may be an address, a number, a code.

Well, then...

How should an identifier for objects of an edition look like?

And what are the objects in an edition, anyway?

An ordered hierarchy of content objects

Atomic?

The editor decides (and is responsible for) what to make citable in the text.

The editor is responsible for assigning (actionable) identifiers to the objects he or she wants to make citable.

The identifiers (URNs) can be just numbers and letters.

DOI: 10.17605/OSF.IO/DBF73 ISBN: 9780715617106 VIAF: 100218993

Or...

The URNs can have a meaningful structure, reflecting hierarchical levels of our edition.

CITE Architecture

Smith & Blackwell, 2001–

The editor distinguishes (and makes citable and actionable) hierarchical levels in an edition.

Parallel?

Overlapping hierarchies in a text

are a well known challenge.

Solutions:

multiple documents, milestones, joins, stand-off markup?

CITE architecture makes it easy to handle multiple documents (concurrent versions).

The editor is responsible for separating concerns:

he or she has to organize various categories of knowledge into separate documents.

Catullus 53:

A new set of criteria on which to judge a scholarly edition:

how well is its knowledge organized?

Actionable and interoperable?

A URN, published on the internet, becomes (digitally) reachable from the outside world.

Remember: in a digital world, to reach is to (be able to) manipulate.

This is what "actionable" means.

But we have to know what we'll find.

We have to understand a URN to be able to program for it!

This is what "interoperable" means.

Powers of ten

What will happen if CITE architecture is widely adopted?

What will happen if scholarly editions become actionable and interoperable from "the top" to "the atomic level"?

What will happen if scholarly editions become actionable and interoperable in parallel?

My brain can manipulate two or five or ten ideas (in one go).

My computer can manipulate tens or hundreds or thousands or millions of objects (if they have identifiers).

If we give our knowledge digital form, boundaries and names, computers can reach it and manipulate it.

Our knowledge becomes (digitally) actionable and interoperable.

My computer can manipulate (combine, select, reorganize, repurpose...) tens or hundreds or thousands or millions of (well identified) knowledge-objects.

We know how to interpret individual (or a couple of) words, passages, texts, genres.

How will we interpret thousands, millions of words, passages, texts, genres?

How will we check these interpretations?

We design to make explicit
knowledge about a text.

We design to make explicit
knowledge about complex cultural phenomena.

"Manipulated" means:
reordered, relocated, reformatted, repurposed, improved, enhanced, criticized.

For a digital object, to reach is to cite:
provide an address...
an address that is actionable.

Ergo: to manipulate anything digitally, we should be able to cite it
(in an actionable way).

A digital object is reachable (citable)
if it has an identifier.

An identifier may be
an address, a number, a code.

The editor decides
(and is responsible for)
what to make citable in the text.

DOI: 10.17605/OSF.IO/DBF73
ISBN: 9780715617106
VIAF: 100218993

The editor distinguishes
(and makes citable and actionable)
hierarchical levels in an edition.

CITE architecture makes it easy to handle multiple documents
(concurrent versions).

The editor is responsible for
separating concerns:

A new set of criteria on which to judge
a scholarly edition:

We have to understand a URN
to be able to program for it!

What will happen if scholarly editions become actionable and interoperable
in parallel?

My computer can manipulate
tens or hundreds or thousands or millions of objects (if they have identifiers).

My computer can manipulate
(combine, select, reorganize, repurpose...)
tens or hundreds or thousands or millions of (well identified) knowledge-objects.

We know how to interpret individual
(or a couple of) words, passages, texts, genres.

How will we interpret
thousands, millions of words, passages, texts, genres?