Teo Samaržija

Hi there! My name is Teo Samaržija. I am a computer engineer (I have a Bachelor degree in Computer Engineering from the FERIT school, University of Osijek) born in 1999. Apart from computer engineering, I am also interested in linguistics.

My software projects

I am primarily interested in front-end development, but I am also interested in low-level development and compiler development. Previously, I was interested in competitive programming. For example, I won the 4th place on Infokup in 2013. However, now I think that the skills you learn at algorithmic competitions are not very useful for developing large programs, if they aren't even counter-productive.

The AEC programming language

At the end of the high-school, I came up with the idea of making a programming language called Arithmetic Expression Compiler, or AEC for short. I started by making a web-app that converts arithmetic expressions to i486-compatible assembly (something similar to GodBolt Compiler Explorer, but with much fewer features, and running in the front-end instead of on the back-end). Then I made a rather-unprofessionally-made compiler that compiles my programming language to assembly (using the Duktape framework, doing high-level things in JavaScript and low-level things in C), and I made a few small programs in AEC. Then I decided that I want to be able to program in my programming language for the web, so I made a AEC-to-WebAssembly compiler from scratch. I rewrote it in C++ (rather than using JavaScript and C, as my original AEC compiler does), and it is 5'500 lines of code. I made a few short programs in it:

  • The Analog Clock. It was challenging to make because you are using trigonometry to make an analog clock, but WebAssembly has no instructions for calculating trigonometric functions (like x86 has fsin and so on). I had to make those functions myself using what we were taught in our calculus classes at the university.

    For those who are curious, here is how I implemented sin and cos (If you enable JavaScript, you will see it syntax-highlighted.):
    Function cos(Decimal32 degrees)
        Which Returns Decimal32 Is Declared; // Because "sin" and "cos" are
                                             // circularly dependent on one another.
    
    Decimal32 sineMemoisation[91];
    
    Function sin(Decimal32 degrees) Which Returns Decimal32 Does {
      If(degrees < 0) Then { Return - sin(-degrees); }
      EndIf;
      If degrees > 90 Then { Return cos(degrees - 90); }
      EndIf;
      If not(sineMemoisation[asm_f32("(f32.nearest (f32.load %degrees))")] = 0)
          Then { // I've used inline assembly here because nothing else I
                 // write will output "f32.nearest" (called "round" in most
                 // programming languages) WebAssembly directive, and it's way
                 // more convenient to insert some inline assembly than to modify
                 // and recompile the compiler.
        Return sineMemoisation[asm_f32("(f32.nearest (f32.load %degrees))")];
      }
      EndIf;
      /*
       * Sine and cosine are defined in Mathematics 2 (I guess it's called Calculus
       * 2 in the English-speaking world) using the system of equations (Cauchy
       * system):
       *
       * sin(0)=0
       * cos(0)=1
       * sin'(x)=cos(x)
       * cos'(x)=-sin(x)
       * ---------------
       *
       * Let's translate that as literally as possible to the programming
       * language.
       */
      Decimal32 radians : = degrees / oneRadianInDegrees, tmpsin : = 0,
                        tmpcos : = 1, epsilon : = radians / PRECISION, i : = 0;
      While((epsilon > 0 and i < radians) or (epsilon < 0 and i > radians)) Loop {
        tmpsin += epsilon * tmpcos;
        tmpcos -= epsilon * tmpsin;
        i += epsilon;
      }
      EndWhile;
      Return sineMemoisation[asm_f32("(f32.nearest (f32.load %degrees))")]
          : = tmpsin;
    }
    EndFunction;
    
    Function cos(Decimal32 degrees) Which Returns Decimal32 Does {
      Return sin(90 - degrees); // Again, basic common sense to somebody who
                                // understands what those terms means.
    }
    EndFunction;
    		
  • The Dragon Curve. It was challenging because I had to interface SVG with my programming language.
  • The solution to the N Queens Puzzle in AEC. It goes without saying why that's challenging.
  • The Huffman Coding in AEC. It was challenging because dealing with structure pointers exposed a few bugs in the compiler which I had to fix in order to continue working.

I think that programming in a programming language you've made is a good idea because it avoids bugs that come from misunderstanding the language you are using. In my experience, many difficult-to-trace bugs in my programs were caused by me being unaware of some quirk of JavaScript or C++. I've written about that on my blog. WebAssembly is designed to be an exceptionally easy target for compilers, so WebAssembly makes programming in a programming language you have designed yourself easy, at least for front-end development.

The AEC programming language has a syntax inspired by Ada and Microsoft SmallBasic, and the semantics is similar to C, but with two major exceptions: arrays and pointers are different data types, and unaligned access is allowed. Unaligned access being allowed in AEC means pointer arithmetic works very differently in C and AEC. For example, you need to check manually whether a pointer that points to an address within an array is actually pointing to an element of that array, rather than to, for example, the second byte of the structure. This is how you properly check whether a pointer points to an actual element of the array of structures (excerpt from Huffman Coding in AEC):

InstantiateStructure TreeNode treeNodes[64];
Integer16 isTreeNodeUsed[64];

Function isTreeNodeWithinBounds(
    PointerToTreeNode treeNode) Which Returns Integer32 Does {
  Return AddressOf(treeNodes[0]) <= treeNode <= AddressOf(treeNodes[64 - 1]) and
      mod(treeNode - AddressOf(treeNodes[0]), SizeOf(TreeNode)) = 0;
}
EndFunction;
	    

In C, you don't have to check whether the mod is 0. Since AEC is targetting x86 and WebAssembly, rather than architectures such as ARM, I believe I made a good choice by enabling unaligned access. The code translates to assembly more directly that way. When targetting x86 or WebAssembly using C, the compiler is essentially lying to you about how the computer architecture you are targetting works.

And about the syntax, I've made two decisions different from any programming language I know: the semi-colon is optional only after EndIf, EndWhile and similar keywords (I've made a StackExchange question about that decision), and you can add optional curly braces after Then, Else, Loop, and so on, and before Else, EndIf, EndWhile, and so on (to make it possible to use ClangFormat and similar tools for C-like languages with my programming language).

One anecdote that made me think about just how counter-productive the knowledge you gain at algorithmic competitions is this. I wanted to build into my AEC-to-WebAssembly compiler suggestions for the misspelt variable names. And, while preparing for algorithmic competitions, I learned about the longest-common-subsequence algorithm from dynamic programming. So, I supposed that algorithm might be useful there. So I spent hours implementing the longest-common-subsequence algorithm into my compiler to show suggestions for mispelt variable names. And it didn't give good results. Then I requested help on a Discord server about programming languages, and somebody explained to me that the LCS algorithm is not a good algorithm to use there, but that I should use Levenshtain Distance. And I did, and then it gave me good results. That made me think that algorithmic competitions encourage students to superficially learn as many algorithms as possible without even fully understanding what those algorithms are doing, which is knowledge that is counter-productive in real-world programming.

PicoBlaze Assembler and Emulator in JavaScript

It was the year 2020, the year of the plague. And my Computer Architecture professor was afraid that the physical laboratory exercises from Computer Architecture would be canceled. At our Computer Architecture classes, we are using Xilinx PicoBlaze as an example of a simple computer. My professor was afraid that students would, if physical laboratory exercises are cancelled, run into all kinds of technical problems trying to run existing PicoBlaze assemblers and emulators on the computers they have at home. So, he tasked me to create a PicoBlaze assembler and emulator which can be run in any modern browser. So I did. You can see our work on SourceForge. I say our because now I am not the only contributor to that project. My work began being used at a university in Argentina, from which I got a contributor called Agustín Izaguirre. And then a back-end developer from Turkey called Abidin Durdu also joined. It is around 4'000 lines of code.

The assembler is rather unconventionally made. Apart from running in a browser, its architecture is unconventional. Parsers of most assemblers produce a matrix (a 2-dimensional array) of many small trees, where each row in the matrix represents a row in the assembly code, the first column containing labels, the second column containing mnemonics, and so on. In that matrix, most of the trees contain a single node, and only arithmetic expressions are encoded using larger trees. I think that's a fundamentally wrong way of structuring an assembler, and that it is the reason it's hard to add new features to assemblers. The parser of my PicoBlaze assembler works like a parser for a high-level language: it produces one large syntax tree. It is probably the most feature-rich assembler for PicoBlaze out there: it supports arithmetic expressions in the constants (including the ternary conditional operator ?:, which assemblers in general tend not to support), it supports if-branching, if-else branching, and while-loops in the preprocessor, all while trying to stay compatible with the Xilinx'es official assembler for PicoBlaze.

Though, to be honest, I am not really sure how useful those assembler features are when writing real-world applications. In my Binary to Decimal program, I never used them. In Decimal to Binary, I only used arithmetic expressions once, to deal with ASCII:

  ;Check whether the character is a digit.
    compare s9, "0"
    ;If it is not a digit, jump to the
    ;part of the program for printing
    ;the number you have got.
    jump    c , print_the_number
    compare s9, "9" + 1
    ;Comparing with "9" + 1 instead of "9"
    ;in order to reduce the number of
    ;instructions is a suggestion by a
    ;StackExchange user called Sep Roland.
    jump    nc, print_the_number
	  

The "9" + 1 is an arithmetic expression, something that the Xilinx'es official assembler doesn't support. And this can be done without the arithmetic expressions in constants, it's only that they allow reducing the number of instructions while keeping the code legible. So, yeah, maybe my assembler is trying to do too much.

It would be interesting to make an empirical study by dividing the Computer Architecture students into two groups, one group using Notepad++ (Ivana Vučić from FERIT made a language server for PicoBlaze assembly language that facilitates editing PicoBlaze assembly language in Notepad++) and Xilinx'es official assembler, and the other group using my PicoBlaze assembler and emulator, and see which of those two groups will fail laboratory exercises at a higher rate. And asking those students what annoyed them the most from the tools they used (I, for example, find having to put traling zeroes in hexadecimal constants to make my program assemble using the Xilinx'es assembler very annoying. And I also find the fact that syntax highlighting in my PicoBlaze assembler and emulator is not entirely correct annoying as well.). That would give me a good feedback on whether my work is actually useful when there is no pandemic. Anecdotal evidence is contradictory. Agustín Izaguirre says that my work is being used at a college in Argentina "to a great effect", but students from FERIT whom I asked about that hadn't used my work at all.

An example execution of the Binary to Decimal example program. The user entered, using the switches, the binary number 10101010, which the program converted to decimal 170 and to Gray Code 11111111 (shown using LEDs).

That project was my Bachelor thesis. One of the challenges I faced is how do I present it as my Bachelor thesis considering that I am not the only author of it. You can read more about that problem on StackExchange. I also often ask myself why are universities teaching Computer Architecture using obscure CPUs such as PicoBlaze, rather than using x86 or ARM. If my university used x86 or ARM to teach Computer Architecture, the situation because of which I had to make that PicoBlaze assembler and emulator couldn't have happened. I asked that question as well on StackExchange.

SVG PacMan

Back in high-school, as an attempt to learn JavaScript well, I made a PacMan game which uses SVG. It is around 1'500 lines of code.

What the PacMan game looks like before it is started.

I have got a cross-site scripting attack on my SVG PacMan game quite a few times (because I didn't sanitize the input properly because I didn't know PHP), and that caused me to think that, when you make a web-app, it is a good idea to make a front-end-only version of your web-app so that at least some functionality of your web-app remains available in case you get a cross-site scripting attack. That's why I made a front-end only version of my SVG PacMan as well as a front-end only version of my PicoBlaze assembler and emulator.

Etymology Game

At the end of high-school, I came up with the idea to make a rather original game: a flash-card game in which the goal is to guess etymologies of the words in simulated languages. I decided to use the JQuery framework for that. With a little help of a front-end developer called Boris Muminović, I managed to. So you can now play the Etymology Game. It consists of three parts. In part one, you are supposed to recognize which words from the two simulated languages are cognates. In part two, you are supposed to guess to which of the two simulated languages each word belongs (I got that idea from the task Tocharian on International Linguistics Olympiad 2003). In part three, you are supposed to guess which set of sound changes belongs to which language. It is around 2'500 lines of code.

Part one
Part two
Part three

Of course, in order to do that, I needed to make a lot of guesses about how languages really behave. So, when I was a third-year Computer Engineering student, I decided to test my algorithms against real-world data. For real-world data, I took the names of the numbers one to ten in various languages (attested and reconstructed). Unfortunately, the results weren't particularly impressive: it did little better than chance. And that made me think how absurd it is for governments to think they can control the language (make language more pure of loan-words, revitalize endangered languages...). If we cannot predict how languages will evolve much better than chance, what makes us think we can know whether language policies produce desired results? Don't you think we are much more likely to get an illusion of control than to actually control the language?

What's amazing to me is that programmers say they want original games and programs, but it seems that they don't really want that. For example, my Etymology Game is arguably the most original thing I've ever made, but it has 0 stars and forks on GitHub. Programmers say they don't want another programming language or an assembler, but it seems they actually want precisely that. The repository on GitHub I made which has the most stars is my AEC-to-WebAssembly compiler. And the one with most actual contributors is my PicoBlaze assembler and emulator. It seems that the world really needs another assembler.

My other work

Like I've said in the introduction, apart from computer engineering, I am interested in linguistics. I am also interested in libertarian politics and vegetarianism.

My linguistic theories

I have always been interested in classical languages. In 2016, I won the 8th place on the AZOO competition in the Latin language. My hobby is publishing and discussing papers about the names of places. In 2022, I published a paper called "Etimologija Karašica" in two peer-reviewed journals: Valpovački Godišnjak and Regionalne Studije (an almanac for the Croatian national minority in Hungary, based in Sopron). Now, I concede that whether my paper was really peer-reviewed is a matter of debate. The reviewers in question were dialectologists and other people from fields distantly related to what I was writing about. They weren't people educated in the interdisciplinary field that touches both linguistics and information theory (such people are, unfortunately, rare). The paper, edited much differently from the version that was published in those two journals (The editors of Valpovački Godišnjak forced me to delete the discussion about whether Illyrian was centum or satem, as it was, according to them, irrelevant.), is available on my blog. Here is the English-language summary of that paper:

To summarize, I think that I have thought of a way to measure the collision entropy of different parts of the grammar, and that it is possible to calculate the p-values of certain patterns in the names of places using them. The entropy of the syntax can obviously be measured by measuring the entropy of spell-checker word list such as that of Aspell and subtracting from that an entropy of a long text in the same language (I was measuring only for the consonants, I was ignoring the vowels, because vowels were not important for what I was trying to calculate). I got that, for example, the entropy of the syntax of the Croatian language is log2(14)-log2(13)=0.107 bits per symbol, that the entropy of the syntax of the English language is log2(13)-log2(11)=0.241 bits per symbol, and that the entropy of the syntax of the German language is log2(15)-log2(12)=0.3219 bits per symbol. It was rather surprising to me that the entropy of the syntax of the German language is larger than the entropy of the syntax of the English language, given that German syntax seems simpler (it uses morphology more than the English language does, somewhat simplifying the syntax), but you cannot argue with the hard data. It looks as though the collision entropy of the syntax and the complexity of the syntax of the same language are not strongly correlated. The entropy of the phonotactics of a language can, I guess, be measured by measuring the entropy of consonant pairs (with or without a vowel inside them) in a spell-checker wordlist, then measuring the entropy of single consonants in that same wordlist, and then subtracting the former from the latter multiplied by two. I measured that the entropy of phonotactics of the Croatian language is 2*log2(14)-5.992=1.623 bits per consonant pair. That 5.992 bits per consonant pair has been calculated using some mathematically dubious method involving the Shannon Entropy (as, back then, I didn't know that there is a simple way to calculate the collision entropy as the negative binary logarithm of the sum of the squares of relative frequencies of symbols, I was measuring the collision entropy using the Monte Carlo method). Now, I have taken the entropy of the phonotactics to be the lower bound of the entropy of the phonology, that is the only entropy that matters in ancient toponyms (entropy of the syntax and morphology do not matter then, because the toponym is created in a foreign language). Given that the Croatian language has 26 consonants, the upper bound of the entropy of morphology, which does not matter when dealing with ancient toponyms, can be estimated as log2(26*26)-1.623-2*0.107-5.992=1.572 bits per pair of consonants. So, to estimate the p-value of the pattern that many names of rivers in Croatia begin with the consonants 'k' and 'r' (Karašica, Krka, Korana, Krbavica, Krapina and Kravarščica), I have done some birthday calculations, first setting the simulated entropy of phonology to be 1.623 bits per consonant pair, and the second by setting the simulated entropy of phonology to be 1.623+1.572=3.195 bits per consonant pair. In both of those birthday calculations, I assumed that there are 100 different river names in Croatia. The former birthday calculation gave me the probability of that k-r-pattern occuring by chance to be 1/300 and the latter gave me the probability 1/17. So the p-value of that k-r-pattern is somewhere between 1/300 and 1/17. Mainstream linguistics considers that k-r pattern in Croatian river names to be a coincidence, but nobody before me (as far as I know) has even attempted to calculate how much of a coincidence it would have to be (the p-value). So I concluded that the simplest explanation is that the river names Karašica, Krka, Korana, Krbavica, Krapina and Kravarščica are related and all come from the Indo-European root *kjers meaning horse (in Germanic languages) or to run (in Celtic and Italic languages). I think the Illyrian word for "flow" came from that root, and that the Illyrian word for "flow" was *karr or *kurr, the vowel difference 'a' to 'u' perhaps being dialectical variation (compare the attested Illyrian toponyms Mursa and Marsonia, the names Mursa and Marsonia almost certainly come from the same root, but there is a vowel difference 'a' to 'u' in them). Furthermore, based on the historical phonology of the Croatian language and what's known about the Illyrian language (for example, that there was a suffix -issia, as in Certissia, the ancient name for Đakovo, but not the suffix -ussia), I reconstructed the Illyrian name for Karašica as either *Kurrurrissia (borrowed into Proto-Slavic as *Kъrъrьsьja, which would give *Karrasja after the Havlik's Law, and then *Karaša after the yotation and the loss of geminates, to which the Croatian suffix -ica was added) or *Kurrirrissia (borrowed into Proto-Slavic as *Kъrьrьsьja, which would also give *Karaša by regular sound changes), and the Illyrian name for Krapina as either *Karpona (borrowed into Proto-Slavic as *Korpyna, which would give "Krapina" after the merger of *y and *i and the metathesis of the liquids) or *Kurrippuppona (borrowed into Proto-Slavic as *Kъrьpъpyna, which would also give "Krapina" by regular sound changes), with preference to *Karpona. Do those arguments sound compelling to you?

From my experience discussing my theory on various Internet forums, I'd say there are two arguments against it which are not completely insane (Most of the people advocating mainstream linguistics on Internet forums seem to have blind faith that information theory has nothing to say about toponyms, which is a view that's, in my opinion, not only not right, but is also not even wrong.):

  1. Your experiment is flawed because it doesn't take into account the possibility that nouns in the Croatian language have a significantly lower collision entropy than the rest of the words in the Aspell word-list. River names are nouns.
    • My response: If the Croatian language had a Swahili-like grammar, that may be a proper response. In Swahili, nouns can only start with one of 18 prefixes (called noun classes), while verbs can start with whatever the phonotactics allows. But Croatian grammar is not remotely like that. What magic would make nouns have a significantly lower collision entropy than verbs in the Croatian language? I think that your fallacy is ad-hoc hypothesis.
  2. Proto-Slavic phonotactics didn't allow four syllables with yers to be consecutive, like in your supposed Proto-Slavic form of the river name "Karašica", *Kъrъrьsьja.
    • My response: Do you have any source for that claim? Also, the mainstream onomastics seems to be fine with proposing etymologies which involve four consecutive yers in Proto-Slavic forms. For instance, mainstream linguistics claims that the town name Cavtat comes from Latin (in) civitate, and the Proto-Slavic form therefore must have been *Kьvьtъtь.
      • Since the 'a' in civitate was long in Classical Latin, it would regularly be borrowed as Proto-Slavic *a, rather than back yer. The Proto-Slavic form was *Kьvьtatь.
        • My resonse: That's not at all how Vulgar Latin phonology worked. One of the basic facts about Vulgar Latin phonology is that Vulgar Latin didn't make a difference between short and long 'a'. See this NativLang video for more information about Vulgar Latin phonology.
          • (The conversation devolves completely...)

Overall, I am relatively confident that my theory is correct, among other things because plenty of people I know in real life who know something about information theory have read my paper and say that my arguments sound compelling to them. Now, admittedly, regarding my idea that Karašica was called *Kurrurrissia in antiquity, the problem is that the early historical phonology of Croatian tends not to be well-known, even among linguistically-educated people, and it's the early historical phonology of Croatian that's necessary to evaluate my linguistic claims. In my experience, even linguistically educated people tend not to know that in early Proto-Slavic (that Croatians were speaking in the 7th century and that the toponyms were borrowed into), before the yers were schwa-like sounds, front yer is reconstructed to have been pronounced as short 'i' and back yer as short 'u'. So you can perhaps dismiss the fact that linguistically-educated people I know in real life say my arguments seem compelling to them as not meaning much. You might even say that the educated guesses about how Croatian was pronounced between the 7th and the 11th century (when it was not attested) are too speculative to be considered science. But I think that we can be rather confident that basic information theory does indeed strongly suggest (it doesn't prove, since you can make ad-hoc hypotheses such as asserting that nouns somehow magically have a significantly lower collision entropy than other words in the Aspell word-list) that the probability of that k-r pattern occurring by chance is somewhere between 1/300 and 1/17.

And, related to that, I advocate the theory that Illyrian belonged to the centum branch of the Indo-European languages. Most Croatian linguists, ever since the 1930s, think it belonged to the satem branch of Indo-European languages. I'll share what I posted on r/latin (Reddit subreddit about Latin language) about that, just to get a general idea:

Eratne lingua Illyrica "centum" aut "satem" lingua? Suntne Albani nativi in Balkane?

Quid homines in hac agora censent, eratne lingua Illyrica "centum" aut "satem" lingua? Linguae Indo-Europeae omnes in duas uniones divisae sunt: "centum" et "satem". In "centum" linguis, Indo-Europeanum phonemum 'kj' in 'k' vertitur. Lingua Latina est "centum" lingua, etiam sunt lingua Graeca et lingua Anglica. In lingua Anglica vere 'kj' in 'h' vertitur, sed, quodam tempore, ante Grimmi legem, 'kj' in 'k' vertebatur in lingua Anglica, et propterea lingua Anglica est "centum" lingua. In "satem" linguis, 'kj' in 's' vertitur. Exempla "satem" linguarum sunt lingua Croatica, lingua Albanica et lingua Sanskrit. James Patrick Mallory scripsit in Encyclopedia of Indo-European Culture se censere id, num Illyrica erat "centum" aut "satem", ex datis quae habemus, sciri non posse. Plurimi linguistae in Croatia, et alibi in Balkane, censent linguam Illyricam fuisse "satem" linguam et etiam progenitorem esse linguae Albanicae. Sed ego censeo linguam Illyricam "centum" linguam fuisse. Die ante heri, ego publicavi YouTube filamentum in lingua Croatica de eo.
https://youtu.be/4QQ2iJZnyUk
In eo filamento, do quinque argumenta pro idea quia lingua Illyrica erat "centum" lingua. Ea argumenta sunt:
  1. 'K'-'r' regularitas in nominibus fluminum in Croatia. In multis nominibus fluminum in Croatia, primus consonans est 'k' et secundus consonans est 'r': Krka, Korana, Krapina, Krbavica, Kravarščica, et duo flumina cum nomine Karašica. Plurimi linguistae censent eam regularitatem coincidentalem esse, sed ego censeo quia theoria informationis (Paradoxa Dierum Natalium et Entropia Collisionum) docet nobis quia probabilitas ut ea regularitas apparet coincidentaliter est inter 1/300 et 1/17. Calculationem habetis in meo textu "Etimologija Karašica", quod publicavi in almanaco Valpovački Godišnjak anno Domini 2022-o. Ego censeo quia nomen "Karašica" venit ex Illyrico nomine Kurr-urr-issia, et quia "kurr" significabat "fluere" (probabiliter ex Indo-Europea *kjers, quod significabat "currere"), "urr" significabat "aqua" (ex Indo-Europea *weh1r), et "-issia" erat suffixum in lingua Illyrica, quod etiam est in antiquo nomine pro Đakovo, "Certissia". Per me, nomen "Kurrurrissia" ivit ex Illyrico in Prae-Sclavicum *Kъrъrьsьja, quod dedit "Karrasj-">"Karaš-ica" (-ica est Croaticum suffixum) in hodierna lingua Croatica. Ego etiam censeo Krapina venisse ex Illyrico nomine Kar-p-ona, "kar" ex *kjers, "p" ex *h2ep (aqua), et "ona" erat suffixum in multis Illyricis nominibus locorum, inter alia, "Salona" et "Albona". Per me, nomen "Karpona" ivit ex Illyrico in Prae-Sclavicum *Korpyna, quod dedit "Krapina" in hodierna lingua Croatica. Et cetera...
  2. Si lingua Illyrica erat "centum" lingua, "Curicum", antiquum nomen pro Krk, potest legi ut "caurus, ventus borealis", ex Indo-Euroepea *(s)kjeh1weros (unde Latinum verbum "caurus" venit), et Krk est borealissima insula in mare nostro.
  3. Si lingua Illyrica erat "centum" lingua, "Incerum", antiquum nomen pro Požega, potest legi ut "cor vallis", ex Indo-Europeais radicibus *h1eyn (vallis) et *kjer(d) (cor).
  4. Si lingua Illyrica erat "centum" lingua, "Cibelae", antiquum nomen pro Vinkovci, potest legi ut "firma casa" vel "castrum", ex Indo-Europeis radicibus *kjey (casa) et *bel (firmus).
  5. Multae inscriptiones in lingua Illyrica incipiunt cum "klauhi zis", et id probabiliter significabat "Audiat Deus...". "Klauhi" ergo probabiliter venit ex *kjlew (audire), ergo, *kj vertitur in *k in lingua Illyrica.
Audiunturne ea argumenta vobis compellentia?

In case you want to watch the YouTube video I made, but your browser cannot stream MP4 videos, try downloading this MP4 file and opening it in VLC Media Player or something similar.

To make something clear, I don't think that most of the etymologies I suggested in the Latin text above are right. I just think that together (not in isolation) they make a strong case that Illyrian was a centum language.

If you would like to read a rather long paragraph about what I think mainstream linguistics gets very wrong when dealing with names of places (and, consequently, languages which are primarily attested through names of places, such as Illyrian), enable JavaScript and click here.

Previously, I was advocating the theory that Proto-Indo-European and Proto-Austronesian are related (in particular, that Proto-Indo-European *s regularly corresponds to Proto-Austronesian *q, that Proto-Indo-European *r regularly corresponds to Proto-Austronesian *l, and that Proto-Indo-European *d regularly corresponds to Proto-Austronesian *d), but now I think that advocating such theories is a hopeless waste of time. I think my time is much better spent attempting to determine some things about Illyrian than to attempt to determine deep-time language relationships. Attempts to determine deep-time language relationships have been done ad nauseam, but attempting to apply information theory to the Croatian toponyms seems like a low-hanging fruit. When you are on an Internet forum, don't use arguments that have been tried ad nauseam, because that will be read as "I want to be the person who discovered something new, but I don't want to put a lot of effort into that.".
I calculated that the probability of such a pattern as Proto-Indo-European *s corresponding to Proto-Austronesian *q occurring by chance is around 6%, but I am quite sure that calculation is making highly unrealistic linguistic assumptions (for instance, that the collision entropy of a single consonant in both proto-languages is around log2(20)), and that a better calculation would give a much higher p-value. In Proto-Indo-European, for many words (perhaps even most words beginning with s-), we aren't sure whether they actually began with an s- or whether that s- is actually a later contamination in many daughter languages (that's called s-mobile), which would drive the p-value even higher, more so than for modern or well-attested languages.

One thing I think everybody, who has done some serious work attempting to mathematically model a language, will agree on is that making unrealistic linguistic assumptions can easily skew your calculations by several orders of magnitude. You need to callibrate your computer models against real-world data. If you assume that the collision entropy of the consonant pairs in the Croatian language is around log2(20*20), the p-value of the k-r pattern in the Croatian river names appears to be around 1/10'000. But if you try to actually measure the collision entropy of different parts of the grammar of the Croatian language, you will get a result that it is somewhere between 1/300 and 1/17. That's a difference of almost 2 to almost 3 orders of magnitude.
Similarly, it's important to understand the basics of information theory, so that you don't use Shannon entropy where the collision entropy is relevant, or vice versa. The Shannon entropy of the consonant pairs in the Aspell word-list for the Croatian language is 7.839 bits per consonant pair, but the collision entropy is 5.992 bits per consonant pair. That might seem like a relatively small difference, but in reality, it's a huge difference, because we are talking about the logarithmic scale. 7.839 is log2(229) and 5.992 is log2(64). Plugged into the birthday calculations, that's the difference in p-value of more than an order of magnitude.

Translating poetry

I've tried translating some poetry from English to Croatian and vice versa. The ones I think I did the best job at translating are the Eric Bogle's The Gift of Years (in case you cannot open it, try downloading this MP4 file and opening it in VLC Media Player or something similar) and Zdenko Runjić'es Galeb i ja (MP4).

I really like the anti-war poetry about World War 1 (like the Eric Bogle's The Gift of Years) because I think that now everybody would agree that World War 1 was in vain. It was supposed to be the war that ended all wars. How could politicians all over the world buy into the idea that you could end all wars by putting the world into one bloody war? I can see how people in Germany might think that, if they win World War 1, the world would be more peaceful because the colonies would be distributed among nations more equally. But I fail to see how could people all over the world buy into the idea of a war to end wars.

If you are going to translate poetry, take this bit of advice from me: Don't translate anti-police songs. Translating anti-war songs about World War 1 is fine, but once you start translating anti-police songs, somebody will get offended and threaten you with a lawsuit.

Reviving Latin

I am a bit into the idea of attempting to revive the Latin language. I have published a few YouTube videos in Latin about modern topics, such as this video about gun control in Latin. If you cannot open it, try downloading this MP4 file and opening it in VLC or something similar. So, yeah, I am not impressed by people who are using many Latin and Greek words while speaking English. I can make an entire video using only Latin and Greek words. Well, almost. I had to use the English word telescoping in my video, but I explained what it meant in Latin. (Telescoping is a phenomenon from social psychology that people tend to remember distant events as if they occurred recently. It is related to gun control because many social scientists think that Gary-Kleck-like studies, that try to estimate how often guns are being used defensively, suffer from massive telescoping. That's, in my opinion, unlikely.)

Libertarianism

I am also interested in libertarian politics. Government-backed-up pseudopsychology suggesting that my father raped some little girl and that I was a witness put my mother and almost my father into jail. I think that the government power should be limited to solving simple problems such as the problem of superbacteria caused by the egg industry (around 70% of antibiotics these days are used in the egg industry, so it's obvious that what the government should do about superbacteria is to regulate the hell out of the egg industry) or the problem of ISPs incorrectly setting up their DNS servers so that they can be used to massively amplify denial-of-service attacks. I think it's not the government's job to attempt to address complicated problems such as violent crime, global warming, or a pandemic that's currently going on. Maybe one day the social sciences will advance to such a degree that they can be trusted to tell the government what it should do. But since that day is not today, the government power should be as limited as possible.

The unfortunate thing about social sciences that I think most people don't understand is this: In natural sciences, in order to discredit an experiment, you usually need to find a flaw with the experimental setup, whereas, in social sciences, you can usually discredit an experiment by saying "Well, perhaps a better mathematical model would suggest the findings are not actually statistically significant.". To understand what I mean, consider the k-r pattern in the Croatian river names. You can point it out to the proponents of mainstream linguistics, and they will say: "You've fallen victim to the Birthday Paradox. The probability of such a pattern occurring by chance isn't low at all.". Then you can point out them the fact that a simple birthday calculation, which assumes that the Croatian language has 20*20=400 equally likely consonant pairs, suggests that the probability of such a pattern occurring by chance is around 1/10'000. Then the proponents of mainstream linguistics will say: "You need to take into account the fact that some consonants are more common than others.". Then you can point them out that the collision entropy of a consonant in the Croatian Aspell word-list is around log2(14), and that a birthday calculation that takes that fact into account suggests that the p-value of that k-r pattern occurring by chance is around 1/500. Then the proponents of mainstream linguistics respond by saying: "Well, some pairs of consonants are more common than others, due to the phonotactic constrains. You need to take that into account.". Then you can study information theory and attempt to measure the collision entropy of different parts of the grammar of the Croatian language (including phonotactics), and show them complicated calculations that suggest the probability of that k-r pattern occurring by chance is somewhere between 1/300 and 1/17. Then they will complain that those calculations are probably incorrect because they are relatively complicated and have not been peer-reviewed. Then you publish those calculations in two peer-reviewed journals and ask a few experts in the information theory to check your math, which is enough to make some (but not nearly all) proponents of mainstream linguistics shut up with the comments about how your calculations are probably wrong. Then the proponents of mainstream linguistics will say stuff like: "What if different word classes in Croatian have a significantly different collision entropy? What if nouns have a significantly lower collision entropy than the rest of the words in the Aspell word-list?". I think you get the point by now: you can usually discredit an experiment in social sciences by making an ad-hoc hypothesis that a more appropriate mathematical model would suggest the findings are not actually statistically significant. And, in theory, we can probably keep going like that forever, or, in praxis, we can go like that until the calculations of the p-value become so complicated that nobody can review them. I see no point in attempting to make a more complicated calculation suggesting that this k-r pattern is indeed statistically significant: if what I've done by now is not enough to convince the proponents of mainstream linguistics, probably nothing would be. Proponents of mainstream linguistics complain both that my calculations supposedly don't reflect how real languages behave (and I see no reason to think that nouns actually have a significantly lower collision entropy than the rest of the words) and that my calculations are so complicated that they are hard to review. The absurdity here should be apparent: a calculation that takes into account how languages really behave (if nouns really have a lower collision entropy than the rest of the words) would be even more difficult to review than my calculation is. The solution proposed by the advocates of mainstream linguistics is often to use what they call traditional methods, but the problem with that is that those traditional methods not only don't appear to be based on mathematically sound principles, they also appear to go precisely against them (they give results that information theory says are improbable, such as that the k-r pattern is a coincidence). What is then the point of trusting social scientists to tell us how the government should operate? If most of the experiments in social sciences can be discredited with saying "Perhaps a better statistical model would suggest the results are not actually statistically significant.", I see no reason to trust what social scientists have to say about politics.

As a former anarchist, I've made a YouTube video about how to convincingly argue against anarchism. If you want to watch it, but your browser cannot stream MP4 videos, try downloading this MP4 file and opening it in VLC or something similar. Though, in my experience, most of the people on the Internet who use the word anarchist to describe themselves are not anarchist or even libertarian. Many people who call themselves anarchist hold anti-vaxxer beliefs. And by anti-vaxxer I don't mean being against forced vaccinations, I mean actually believing that vaccines somehow magically cause all kinds of diseases. That's blatantly incompatible with anarchism. What makes them think vaccines would be tested more in an anarchy? It seems obvious to me that, in an anarchy, we would have vaccines for a new disease sooner, but those vaccines would be less tested. So, yeah, it's possible that my video will not convince an anarchist you are debating with precisely because they are not actually anarchist.
Anarchist ideas seem appealing because they make world problems seem like they are self-solveable. A good example of that is the problem of superbacteria. Anarchists, when asked what their take on superbacteria is, respond with something like: "Well, the problem of superbacteria is largely caused by the governments removing the incentive for scientists to discover new classes of antibiotics via intellectual property laws. And even so, the lab-grown meat will soon make the problem far less severe, since around 80% of antibiotics these days are given to farm animals. And you can do your part now by going vegetarian.". That sounds so appealing, people want to believe that so much that they will not do research (such as looking up the statistics on where exactly antibiotics used in agriculture go...) to realize that's nonsense. There are a number of problems with that response. Number one, there is no reason to think there even are useful antibiotics left for scientists to discover. The fact that we've found dozens of chemical compounds that kill prokaryotes but not eukaryotes is already amazing, we cannot expect that there are more. And lab-grown meat won't address the problem of superbacteria because almost all antibiotics used in agriculture these days are being used in the egg industry, and we won't have lab-grown eggs any time soon. We struggle to produce muscle meat in laboratories, and eggs are far more complicated than muscles. But to a layman, the anarchist response will probably sound very appealing.

Vegetarianism

I am also a vegetarian and I like doing vegegelicism on Internet forums. While I don't know what's the best way to do vegegelicism, I know very well what one shouldn't do. I've written a long blog-post about vegegelicism.

One important thing which is, for a reason that escapes me, usually left unsaid when it comes to things related to vegetarianism, veganism, and similar things, is that the vast majority of antibiotics used in agriculture go to the egg industry. That means that, if you stop eating meat, but you continue buying eggs from factory farms (like I was doing for almost a decade), you are not really helping public health.