Croatiae Auctores Latini (CroaLa)

a Neo-Latin Corpus for Fun and Profit

Neven Jovanović (neven.jovanovic@ffzg.hr) Florilegia: Big Textual Data Workshop, July 10-11, 2017

This page: croala.ffzg.unizg.hr/croala-florilegia/
Repository: bitbucket.org/nevenjovanovic/croala-leipzig-bigdata-2017

The plan

What?

How? (At random, precisely, topically, comparatively)

What next?

What?

Croatian writings in other languages

Languages of culture used by Croatians in the past:
Latin, Italian, German

Bibliographic research for Latin: 1263 authors, 4871 works in the period 976-1984

Croatiae auctores Latini

Currently in CroALa: 5.7 million words, 467 documents, bibliographic data.

First edition 2009
ISBN: 978-953-175-356-2

croala.ffzg.unizg.hr (PhiloLogic 3)
PhiloLogic 4
BaseX
Github 1, Github 2 (CTS)

License: CC-BY.

But what do you have inside?

A "black box"?

Our experience of the text is constrained
by our knowledge and prejudices presuppositions,
by print (book),
by software (PhiloLogic),
by a markup model system (TEI XML).

A tale of two sets

Flight from literature?

croala.ffzg.unizg.hr/basex/quaelibet

Counting documents and words

Documents with more than 1000 words

Documents with less than 1000 words

Cur lingua rerum index sit

(Marko Marulić, Repertorium, Problemata Aristotelis)

Approaching CroALa through indices

CroALa through indices

A list of persons in Croatian Latin school drama

A category of place references in CroALa as a CITE/CTS system
(CroALa index locorum is an ongoing collaboration with Pelagios)

CroALa through comparation

Approaching CroALa through comparation

Some of the (interesting) clausulae (verse endings) from the Poeti d' Italia in lingua latina occurring in CroALa as well: Clausulae trium verborum

Counting Latin authors in Croatia and in Tyrol:
Croatica et Tyrolensia

Looking for Lucretius in CroALa (forthcoming)

Integrating CroALa in other collections

CroALa in other collections

A CTS version: in Perseus DL through Travis CI

In Corpus corporum, because of XML

In Neulateinische Wortliste as a source

A subset of CroALa: letters in EMLO (forthcoming)

What next?

In the struggles with material agency [our] plans and goals too are at stake and liable to revision. And thus the intentional character of human agency has a further aspect of temporal emergence, being reconfigured itself in the real time of practice, as well as a further aspect of intertwining with material agency, being reciprocally redefined with the contours of material agency in tuning.

Andrew Pickering, The Mangle of Practice: Time, Agency, and Science (1995)

What had started as an ad hoc collection of TEI-encoded texts with time grew into a forest; the forest needed a map, and there was a thing that could be developed into a map; we needed specific parts of the forest, and there were things that could be used to extract parts of it; we realized that similar plants are growing in other forests, and there were things that could be used to compare plants.

What had started as a collection of texts turned into something that has to be "manipulated" — accessed, retrieved — at the atomic and molecular level
(read: words, phrases, sentences, passages).

What is referred to at each of the levels?

(Read: meanings, grammar, things.)