Like I've said in the introduction, apart from computer engineering,
I am interested in linguistics. I am also interested in libertarian
politics and vegetarianism.
My linguistic theories
I have always been interested in classical languages.
In 2016, I won the 8th place on the AZOO competition in the
Latin language. My hobby is publishing and discussing papers about the names of
places. In 2022, I published a paper called "Etimologija Karašica" in two peer-reviewed journals:
Valpovački Godišnjak
and Regionalne Studije (an almanac for the Croatian national
minority in Hungary, based in Sopron).
Now, I concede that whether my paper was really
peer-reviewed is a matter of debate. The reviewers in
question were dialectologists and other people from fields
distantly related to what I was writing about. They weren't
people educated in the interdisciplinary field that touches both
linguistics and information theory (such people are,
unfortunately, rare).
The paper, edited much differently from the version that was
published in those two journals (The editors of Valpovački
Godišnjak forced me to delete the discussion about whether
Illyrian was centum or satem, as it was, according to them,
irrelevant.), is available
on my blog. Here is the English-language summary of that paper:
To summarize, I think that I have thought of a way to measure the
collision entropy of different parts of the grammar, and that it
is possible to calculate the p-values of certain patterns in the
names of places using them. The entropy of the syntax can
obviously be measured by measuring the entropy of spell-checker
word list such as that of Aspell and subtracting from that an
entropy of a long text in the same language (I was measuring only
for the consonants, I was ignoring the vowels, because vowels were
not important for what I was trying to calculate). I got that, for
example, the entropy of the syntax of the Croatian language is
log2(14)-log2(13)=0.107 bits per symbol, that the entropy of the
syntax of the English language is log2(13)-log2(11)=0.241 bits per
symbol, and that the entropy of the syntax of the German language
is log2(15)-log2(12)=0.3219 bits per symbol. It was rather
surprising to me that the entropy of the syntax of the German
language is larger than the entropy of the syntax of the English
language, given that German syntax seems simpler (it uses
morphology more than the English language does, somewhat
simplifying the syntax), but you cannot argue with the hard data.
It looks as though the collision entropy of the syntax and the
complexity of the syntax of the same language are not strongly
correlated. The entropy of the phonotactics of a language can, I
guess, be measured by measuring the entropy of consonant pairs
(with or without a vowel inside them) in a spell-checker wordlist,
then measuring the entropy of single consonants in that same
wordlist, and then subtracting the former from the latter
multiplied by two. I measured that the entropy of phonotactics of
the Croatian language is 2*log2(14)-5.992=1.623 bits per consonant
pair. That 5.992 bits per consonant pair has been calculated using
some mathematically dubious method involving the Shannon Entropy
(as, back then, I didn't know that there is a simple way to
calculate the collision entropy as the negative binary logarithm
of the sum of the squares of relative frequencies of symbols, I
was measuring the collision entropy using the Monte Carlo method).
Now, I have taken the entropy of the phonotactics to be the lower
bound of the entropy of the phonology, that is the only entropy
that matters in ancient toponyms (entropy of the syntax and
morphology do not matter then, because the toponym is created in a
foreign language). Given that the Croatian language has 26
consonants, the upper bound of the entropy of morphology, which
does not matter when dealing with ancient toponyms, can be
estimated as log2(26*26)-1.623-2*0.107-5.992=1.572 bits per pair
of consonants. So, to estimate the p-value of the pattern that
many names of rivers in Croatia begin with the consonants 'k' and
'r' (Karašica, Krka, Korana, Krbavica, Krapina and Kravarščica), I
have done some birthday calculations, first setting the simulated
entropy of phonology to be 1.623 bits per consonant pair, and the
second by setting the simulated entropy of phonology to be
1.623+1.572=3.195 bits per consonant pair. In both of those
birthday calculations, I assumed that there are 100 different
river names in Croatia. The former birthday calculation gave me
the probability of that k-r-pattern occuring by chance to be 1/300
and the latter gave me the probability 1/17. So the p-value of
that k-r-pattern is somewhere between 1/300 and 1/17. Mainstream
linguistics considers that k-r pattern in Croatian river names to
be a coincidence, but nobody before me (as far as I know) has even
attempted to calculate how much of a coincidence it would have to
be (the p-value). So I concluded that the simplest explanation is
that the river names Karašica, Krka, Korana, Krbavica, Krapina and
Kravarščica are related and all come from the Indo-European root
*kjers meaning horse (in Germanic languages) or to run (in Celtic
and Italic languages). I think the Illyrian word for "flow" came
from that root, and that the Illyrian word for "flow" was *karr or
*kurr, the vowel difference 'a' to 'u' perhaps being dialectical
variation (compare the attested Illyrian toponyms Mursa and
Marsonia, the names Mursa and Marsonia almost certainly come from
the same root, but there is a vowel difference 'a' to 'u' in
them). Furthermore, based on the historical phonology of the
Croatian language and what's known about the Illyrian language
(for example, that there was a suffix -issia, as in Certissia, the
ancient name for Đakovo, but not the suffix -ussia), I
reconstructed the Illyrian name for Karašica as either
*Kurrurrissia (borrowed into Proto-Slavic as *Kъrъrьsьja, which
would give *Karrasja after the Havlik's Law, and then *Karaša
after the yotation and the loss of geminates, to which the
Croatian suffix -ica was added) or *Kurrirrissia (borrowed into
Proto-Slavic as *Kъrьrьsьja, which would also give *Karaša by
regular sound changes), and the Illyrian name for Krapina as
either *Karpona (borrowed into Proto-Slavic as *Korpyna, which
would give "Krapina" after the merger of *y and *i and the
metathesis of the liquids) or *Kurrippuppona (borrowed into
Proto-Slavic as *Kъrьpъpyna, which would also give "Krapina" by
regular sound changes), with preference to *Karpona. Do those
arguments sound compelling to you?
From my experience discussing my theory on various Internet
forums, I'd say there are two arguments against it which are not
completely insane (Most of the people advocating mainstream linguistics on
Internet forums seem to have blind faith that information theory
has nothing to say about toponyms, which is a view that's, in my
opinion, not only not right, but is also not even wrong.):
-
Your experiment is flawed because it doesn't take into account
the possibility that nouns in the Croatian language have a
significantly lower collision entropy than the rest of the words
in the Aspell word-list. River names are nouns.
-
My response: If the Croatian language had a
Swahili-like grammar, that may be a proper response. In
Swahili, nouns can only start with one of 18 prefixes
(called noun classes), while verbs can start with
whatever the phonotactics allows. But Croatian grammar is
not remotely like that. What magic would make nouns have a
significantly lower collision entropy than verbs in the
Croatian language? I think that your fallacy is ad-hoc
hypothesis.
-
Proto-Slavic phonotactics didn't allow four syllables with yers
to be consecutive, like in your supposed Proto-Slavic form of
the river name "Karašica", *Kъrъrьsьja.
-
My response: Do you have any source for
that claim? Also, the mainstream onomastics seems to be fine
with proposing etymologies which involve four consecutive
yers in Proto-Slavic forms. For instance, mainstream
linguistics claims that the town name Cavtat comes
from Latin (in) civitate, and the Proto-Slavic form
therefore must have been *Kьvьtъtь.
-
Since the 'a' in civitate was long in Classical
Latin, it would regularly be borrowed as Proto-Slavic
*a, rather than back yer. The Proto-Slavic form was
*Kьvьtatь.
-
My resonse: That's not at all how
Vulgar Latin phonology worked. One of the basic
facts about Vulgar Latin phonology is that Vulgar
Latin didn't make a difference between short and
long 'a'. See
this NativLang video
for more information about Vulgar Latin phonology.
- (The conversation devolves completely...)
Overall, I am relatively confident that my theory is correct,
among other things because plenty of people I know in real life
who know something about information theory have read my paper and
say that my arguments sound compelling to them. Now, admittedly,
regarding my idea that Karašica was called *Kurrurrissia in
antiquity, the problem is that the early historical phonology of
Croatian tends not to be well-known, even among
linguistically-educated people, and it's the early historical
phonology of Croatian that's necessary to evaluate my linguistic
claims. In my experience, even linguistically educated people tend
not to know that in early Proto-Slavic (that Croatians were
speaking in the 7th century and that the toponyms were borrowed
into), before the yers were schwa-like sounds, front yer is
reconstructed to have been pronounced as short 'i' and back yer as
short 'u'. So you can perhaps dismiss the fact that
linguistically-educated people I know in real life say my
arguments seem compelling to them as not meaning much. You might
even say that the educated guesses about how Croatian was
pronounced between the 7th and the 11th century (when it was not
attested) are too speculative to be considered science. But I
think that we can be rather confident that basic information
theory does indeed strongly suggest (it doesn't prove, since you
can make ad-hoc hypotheses such as asserting that nouns somehow
magically have a significantly lower collision entropy than other
words in the Aspell word-list) that the probability of that k-r
pattern occurring by chance is somewhere between 1/300 and 1/17.
And, related to that, I advocate the theory that Illyrian belonged
to the centum branch of the Indo-European languages. Most Croatian
linguists, ever since the 1930s, think it belonged to the satem
branch of Indo-European languages. I'll share
what I posted on r/latin (Reddit subreddit about Latin
language) about that, just to get a general idea:
Eratne lingua Illyrica "centum" aut "satem" lingua? Suntne
Albani nativi in Balkane?
Quid homines in hac agora censent, eratne lingua Illyrica "centum"
aut "satem" lingua? Linguae Indo-Europeae omnes in duas uniones
divisae sunt: "centum" et "satem". In "centum" linguis,
Indo-Europeanum phonemum 'kj' in 'k' vertitur. Lingua Latina est
"centum" lingua, etiam sunt lingua Graeca et lingua Anglica. In
lingua Anglica vere 'kj' in 'h' vertitur, sed, quodam tempore,
ante Grimmi legem, 'kj' in 'k' vertebatur in lingua Anglica, et
propterea lingua Anglica est "centum" lingua. In "satem" linguis,
'kj' in 's' vertitur. Exempla "satem" linguarum sunt lingua
Croatica, lingua Albanica et lingua Sanskrit. James Patrick
Mallory scripsit in Encyclopedia of Indo-European Culture se
censere id, num Illyrica erat "centum" aut "satem", ex datis quae
habemus, sciri non posse. Plurimi linguistae in Croatia, et alibi
in Balkane, censent linguam Illyricam fuisse "satem" linguam et
etiam progenitorem esse linguae Albanicae. Sed ego censeo linguam
Illyricam "centum" linguam fuisse. Die ante heri, ego publicavi
YouTube filamentum in lingua Croatica de eo.
https://youtu.be/4QQ2iJZnyUk
In eo filamento, do quinque argumenta pro idea quia lingua
Illyrica erat "centum" lingua. Ea argumenta sunt:
-
'K'-'r' regularitas in nominibus fluminum in Croatia. In
multis nominibus fluminum in Croatia, primus consonans est 'k'
et secundus consonans est 'r': Krka, Korana, Krapina,
Krbavica, Kravarščica, et duo flumina cum nomine
Karašica. Plurimi linguistae censent eam regularitatem
coincidentalem esse, sed ego censeo quia theoria informationis
(Paradoxa Dierum Natalium et Entropia Collisionum) docet nobis
quia probabilitas ut ea regularitas apparet coincidentaliter
est inter 1/300 et 1/17. Calculationem habetis in meo textu
"Etimologija Karašica", quod publicavi in almanaco
Valpovački Godišnjak anno Domini 2022-o. Ego censeo quia nomen
"Karašica" venit ex Illyrico nomine Kurr-urr-issia, et
quia "kurr" significabat "fluere" (probabiliter ex
Indo-Europea *kjers, quod significabat "currere"), "urr"
significabat "aqua" (ex Indo-Europea *weh1r), et "-issia" erat
suffixum in lingua Illyrica, quod etiam est in antiquo nomine
pro Đakovo, "Certissia". Per me, nomen "Kurrurrissia"
ivit ex Illyrico in Prae-Sclavicum
*Kъrъrьsьja, quod dedit
"Karrasj-">"Karaš-ica" (-ica est Croaticum suffixum)
in hodierna lingua Croatica. Ego etiam censeo Krapina venisse
ex Illyrico nomine Kar-p-ona, "kar" ex *kjers, "p" ex *h2ep
(aqua), et "ona" erat suffixum in multis Illyricis nominibus
locorum, inter alia, "Salona" et "Albona". Per me, nomen
"Karpona" ivit ex Illyrico in Prae-Sclavicum *Korpyna, quod
dedit "Krapina" in hodierna lingua Croatica. Et cetera...
-
Si lingua Illyrica erat "centum" lingua, "Curicum", antiquum
nomen pro Krk, potest legi ut "caurus, ventus borealis", ex
Indo-Euroepea *(s)kjeh1weros (unde Latinum verbum "caurus"
venit), et Krk est borealissima insula in mare nostro.
-
Si lingua Illyrica erat "centum" lingua, "Incerum", antiquum
nomen pro Požega, potest legi ut "cor vallis", ex
Indo-Europeais radicibus *h1eyn (vallis) et *kjer(d) (cor).
-
Si lingua Illyrica erat "centum" lingua, "Cibelae", antiquum
nomen pro Vinkovci, potest legi ut "firma casa" vel "castrum",
ex Indo-Europeis radicibus *kjey (casa) et *bel (firmus).
-
Multae inscriptiones in lingua Illyrica incipiunt cum "klauhi
zis", et id probabiliter significabat "Audiat Deus...".
"Klauhi" ergo probabiliter venit ex *kjlew (audire), ergo, *kj
vertitur in *k in lingua Illyrica.
Audiunturne ea argumenta vobis compellentia?
In case you want to watch the YouTube video I made, but your
browser cannot stream MP4 videos, try downloading
this MP4 file
and opening it in VLC Media Player or something similar.
To make something clear, I don't think that most of the
etymologies I suggested in the Latin text above are right. I just
think that together (not in isolation) they make
a strong case that Illyrian was a centum language.
If you would like to read a rather long paragraph about what I
think mainstream linguistics gets very wrong when dealing with
names of places (and, consequently, languages which are primarily
attested through names of places, such as Illyrian), enable
JavaScript and
click here.
Here is what I think mainstream onomastics (the part of
linguistics that deals with names) gets wrong. One of the basic
principles of mainstream onomastics is that the etymologies from
languages we know well (Croatian, Latin, Celtic...) are preferred
over the etymologies from languages we know little about
(Illyrian...). Well, I think that principle is wrong for two
reasons. The first reason is: What is the mathematical foundation
which suggests that the etymologies from languages we know well
are more probable than etymologies from languages we know little
about? It seems to me there is no mathematical basis for that
assumption. The second reason is that the principle seems
incompatible with information theory. The mainstream methodology
gave the result that this k-r pattern in the Croatian river names
is a coincidence, but basic information theory strongly suggests
that the pattern is statistically significant (that its p-value is
somewhere between 1/300 and 1/17). The right thing to do is to
throw away that methodology and search for a better one, and
revise everything that the old methodology produced using that new
methodology (For similar reasons, I think that economic "schools of thought" that are difficult to make compatible with basic game
theory should be rejected.). Maybe my methodology isn't good,
after all, it's pretty ad-hoc. But at least it's not blatantly
incompatible with information theory, like the mainstream
methodology is. There are a few other problems with mainstream
onomastics, but I think the adherence to that questionable
principle is by far the biggest one. Let me be clear that I am not
saying that the comparative method of reconstructing
proto-languages is wrong. Comparative method is based on good
principles: systematic sound-changes is indeed how languages
behave, and it's indeed mathematically improbable that two
unrelated languages would show apparent regular sound
correspondences. But what the linguists are doing when studying
toponyms has almost nothing to do with the comparative method. I
am not even saying that all the etymologies that the
mainstream onomastics gave are wrong. Some Illyrian etymologies
will appear probable no matter which methodology you are using.
For instance, that Serapia (the ancient name for the Bednja river)
is a compound word of *ser (to flow) and *h2ep (water) and that -ia is a common suffix in Illyrian
(Marsonia, Pannonia, Andautonia...), so that
Serapia literally means "flowing water". Such
extremely transparent names of places are rare, though.
I think that most of the people who study names of places know
almost nothing about the information theory, so they cannot
realize that their methodology flies in the face of it.
Furthermore, I think that many people who do know something about
information theory misinterpret the Birthday Paradox as if it is
saying that the probability of patterns such as the k-r pattern in
the Croatian river names occurring by chance is high. If the
Birthday Paradox actually said such a thing, then the mainstream
methodology of onomastics would probably be justified. But the
Birthday Paradox actually says no such thing, the Birthday Paradox
basically disappears once the number of people who share the same
birthday increases to three or four. However, probably the easiest
way to realize that fact about the Birthday Paradox is to do
numerical calculations, and, unfortunately, people who study names
of places tend not to know basics of programming. And then perhaps
many people who do realize those things rationalize that a
statistical model which is callibrated against real-world
linguistic data would suggest that the k-r pattern is not actually
statistically significant (In other words, that, even though
information theory and mainstream onomastics are deeply
incompatible in theory, perhaps they are compatible in practice.).
To really evaluate that claim, one probably needs to have a rather
deep understanding of the basics of the information theory (to
realize what Collision Entropy is and why it is relevant here) and
also know at least basic programming (to statistically analyze an
Aspell word-list and a long text in the Croatian language), and
most of the onomasticians understand neither of those things. It's
easy to see why somebody who is indoctrinated into mainstream
onomastics would have a rather difficult time escaping it. And
when you put a lot of effort and show them a paper that explains
clearly how they have been fooled (how basic information theory
strongly suggests the p-value of that k-r pattern is somewhere
between 1/300 and 1/17), the proponents of mainstream onomastics
don't want to admit that. They will make blatant ad-hoc hypotheses
("What if the collision entropy of the nouns in the Croatian
language is significantly lower than the collision entropy of
all of the words in the Aspell word-list?" or "Maybe if you calculate only with river names that have no
obvious Slavic roots, you will get a significantly higher
p-value."), or make wild empirical claims without evidence to
contradict my theories ("Proto-Slavic phonotactics didn't allow four consecutive
syllables with yers." or "River names and stream names that start with k-r are about as
common in Serbia as they are in Croatia, and Illyrian wasn't
spoken in Serbia. In ancient times, in Serbia, Dacian and
Thracian were spoken, and (unlike for Illyrian which is so
poorly attested that we do not know whether it was centum or
satem) we can be sure they were satem languages. Those river
names in Serbia cannot come from *kjers, so I see no reason to assume those river names in Croatia
do."), or pretend not to be able to understand my calculations
("What is a logarithm? Why am I supposed to know that?"), or use some unarticulated arguments ("Attempting to apply information theory to toponyms is not an
advancement in methodology, it is a methodological error,
because [some unintelligible word-salad]." or "Are you implying that the words 'krava' (cow) and 'karfiol'
(cauliflower) come from that supposed Illyrian word meaning 'to
flow'?"). It's very frustrating.
Makes me wonder, why should people who advocate mainstream
onomastics be treated any better than the typical Moon Landing
conspiracy theorists? When I was a Moon Landing conspiracy
theorist in 2016, somebody said about me: "Had he actually done the math and found that, for example, a
rocket launched from the Earth for some reason cannot reach the
Second Cosmic Speed, that would be different. But what he is
doing now is incredibly insulting.". Yeah, if people advocating mainstream onomastics actually
made a statistical model of a language which suggests that the k-r
pattern is not actually statistically significant (for example, by
actually compiling a long list of nouns in the Croatian language
and measuring the collision entropy of the nouns, showing that
they really have lower collision entropy than the rest of the
words in the Aspell word-list), or perhaps if they presented a
mathematical explanation of why etymologies from languages we know
well are more probable than etymologies from languages we know
little about (so that we can perhaps determine the real p-value,
the p-value adjusted for how much we know about some ancient
languages, using the Bayesian Theorem), that would be different.
But what they are doing now can rightfully be considered
insulting.
Another reason, which I think demonstrates how absurd mainstream
onomastics is, is the fact that the proponents of mainstream
onomastics aren't nearly as skeptical towards the patterns in the
toponyms which support their narrative. They, for example, will be
happy to point you to the fact that d-n in river names repeats in
places where Scythian was spoken in antiquity (Danube, Don,
Dniester, Dnieper...), as evidence that *danu was the Scythian
word for "to flow". Does that supposed pattern
hold up to scrutiny? If I had to guess, I would say no, that the
d-n pattern is probably not statistically significant. At best,
you can put it in the same category as the k-r pattern in the
Croatian river names. So why aren't the proponents of mainstream
linguistics just as skeptical of that d-n pattern as they are of
the k-r pattern?
Previously, I was advocating the theory that Proto-Indo-European
and Proto-Austronesian are related (in particular, that
Proto-Indo-European *s regularly corresponds to Proto-Austronesian
*q, that Proto-Indo-European *r regularly corresponds to
Proto-Austronesian *l, and that Proto-Indo-European *d regularly
corresponds to Proto-Austronesian *d), but now I think that
advocating such theories is a hopeless waste of time. I think my
time is much better spent attempting to determine some things
about Illyrian than to attempt to determine deep-time language
relationships. Attempts to determine deep-time language
relationships have been done ad nauseam, but attempting to
apply information theory to the Croatian toponyms seems like a
low-hanging fruit. When you are on an Internet forum, don't use
arguments that have been tried ad nauseam, because that
will be read as "I want to be the person who discovered something new, but I
don't want to put a lot of effort into that.".
I calculated that the probability of such a pattern as
Proto-Indo-European *s corresponding to Proto-Austronesian *q
occurring by chance is around 6%, but I am quite sure that
calculation is making highly unrealistic linguistic assumptions
(for instance, that the collision entropy of a single consonant in
both proto-languages is around log2(20)), and that a better calculation would give a much higher
p-value. In Proto-Indo-European, for many words (perhaps even most
words beginning with s-), we aren't sure whether they actually
began with an s- or whether that s- is actually a later
contamination in many daughter languages (that's called s-mobile),
which would drive the p-value even higher, more so than for modern
or well-attested languages.
One thing I think everybody, who has done some serious work
attempting to mathematically model a language, will agree on is
that making unrealistic linguistic assumptions can easily skew
your calculations by several orders of magnitude. You need to
callibrate your computer models against real-world data. If you
assume that the collision entropy of the consonant pairs in the
Croatian language is around log2(20*20), the p-value of the k-r pattern in the Croatian river
names appears to be around 1/10'000. But if you try to actually
measure the collision entropy of different parts of the grammar of
the Croatian language, you will get a result that it is somewhere
between 1/300 and 1/17. That's a difference of almost 2 to almost
3 orders of magnitude.
Similarly, it's important to understand the basics of information
theory, so that you don't use Shannon entropy where the collision
entropy is relevant, or vice versa. The Shannon entropy of the
consonant pairs in the Aspell word-list for the Croatian language
is 7.839 bits per consonant pair, but the collision entropy is
5.992 bits per consonant pair. That might seem like a relatively
small difference, but in reality, it's a huge difference, because
we are talking about the logarithmic scale. 7.839 is log2(229) and 5.992 is log2(64). Plugged into the birthday calculations, that's the
difference in p-value of more than an order of magnitude.
Libertarianism
I am also interested in libertarian politics. Government-backed-up
pseudopsychology suggesting that my father raped some little girl
and that I was a witness put my mother and almost my father into
jail. I think that the government power should be limited to
solving simple problems such as the problem of superbacteria
caused by the egg industry (around 70% of antibiotics these days
are used in the egg industry, so it's obvious that what the
government should do about superbacteria is to regulate the hell
out of the egg industry) or the problem of ISPs incorrectly
setting up their DNS servers so that they can be used to massively
amplify denial-of-service attacks. I think it's not the
government's job to attempt to address complicated problems such
as violent crime, global warming, or a pandemic that's currently
going on. Maybe one day the social sciences will advance to such a
degree that they can be trusted to tell the government what it
should do. But since that day is not today, the government power
should be as limited as possible.
The unfortunate thing about social sciences that I think most
people don't understand is this: In natural sciences, in order to
discredit an experiment, you usually need to find a flaw with the
experimental setup, whereas, in social sciences, you can usually
discredit an experiment by saying "Well, perhaps a better mathematical model would suggest the
findings are not actually statistically significant.". To understand what I mean, consider the k-r pattern in
the Croatian river names. You can point it out to the proponents
of mainstream linguistics, and they will say: "You've fallen victim to the Birthday Paradox. The probability
of such a pattern occurring by chance isn't low at all.". Then you can point out them the fact that a simple
birthday calculation, which assumes that the Croatian language has
20*20=400 equally likely consonant pairs, suggests that the
probability of such a pattern occurring by chance is around
1/10'000. Then the proponents of mainstream linguistics will say:
"You need to take into account the fact that some consonants are
more common than others.". Then you can point them out that the collision entropy of
a consonant in the Croatian Aspell word-list is around log2(14), and that a birthday calculation that takes that fact into
account suggests that the p-value of that k-r pattern occurring by
chance is around 1/500. Then the proponents of mainstream
linguistics respond by saying: "Well, some pairs of consonants are more common than others, due
to the phonotactic constrains. You need to take that into
account.". Then you can study information theory and attempt to
measure the collision entropy of different parts of the grammar of
the Croatian language (including phonotactics), and show them
complicated calculations that suggest the probability of that k-r
pattern occurring by chance is somewhere between 1/300 and 1/17.
Then they will complain that those calculations are probably
incorrect because they are relatively complicated and have not
been peer-reviewed. Then you publish those calculations in two
peer-reviewed journals and ask a few experts in the information
theory to check your math, which is enough to make some (but not
nearly all) proponents of mainstream linguistics shut up with the
comments about how your calculations are probably wrong. Then the
proponents of mainstream linguistics will say stuff like: "What if different word classes in Croatian have a significantly
different collision entropy? What if nouns have a significantly
lower collision entropy than the rest of the words in the Aspell
word-list?". I think you get the point by now: you can usually
discredit an experiment in social sciences by making an ad-hoc
hypothesis that a more appropriate mathematical model would
suggest the findings are not actually statistically significant.
And, in theory, we can probably keep going like that forever, or,
in praxis, we can go like that until the calculations of the
p-value become so complicated that nobody can review them. I see
no point in attempting to make a more complicated calculation
suggesting that this k-r pattern is indeed statistically
significant: if what I've done by now is not enough to convince
the proponents of mainstream linguistics, probably nothing would
be. Proponents of mainstream linguistics complain both that my
calculations supposedly don't reflect how real languages behave
(and I see no reason to think that nouns actually have a
significantly lower collision entropy than the rest of the words)
and that my calculations are so complicated that they are hard to
review. The absurdity here should be apparent: a calculation that
takes into account how languages really behave (if nouns really
have a lower collision entropy than the rest of the words) would
be even more difficult to review than my calculation is. The
solution proposed by the advocates of mainstream linguistics is
often to use what they call traditional methods, but the problem
with that is that those traditional methods not only don't appear
to be based on mathematically sound principles, they also appear
to go precisely against them (they give results that information
theory says are improbable, such as that the k-r pattern is a
coincidence). What is then the point of trusting social scientists
to tell us how the government should operate? If most of the
experiments in social sciences can be discredited with saying
"Perhaps a better statistical model would suggest the results
are not actually statistically significant.", I see no reason to trust what social scientists have to
say about politics.
As a former anarchist, I've made
a YouTube video about how to convincingly argue against
anarchism. If you want to watch it, but your browser cannot stream MP4
videos, try downloading
this MP4 file
and opening it in VLC or something similar. Though, in my
experience, most of the people on the Internet who use the word
anarchist to describe themselves are not anarchist or even
libertarian. Many people who call themselves anarchist hold
anti-vaxxer beliefs. And by anti-vaxxer I don't mean being
against forced vaccinations, I mean actually believing that
vaccines somehow magically cause all kinds of diseases. That's
blatantly incompatible with anarchism. What makes them think
vaccines would be tested more in an anarchy? It seems obvious to
me that, in an anarchy, we would have vaccines for a new disease
sooner, but those vaccines would be less tested. So, yeah, it's
possible that my video will not convince an anarchist you
are debating with precisely because they are not actually
anarchist.
Anarchist ideas seem appealing because they make world problems
seem like they are self-solveable. A good example of that is the
problem of superbacteria. Anarchists, when asked what their take
on superbacteria is, respond with something like: "Well, the problem of superbacteria is largely caused by the
governments removing the incentive for scientists to discover
new classes of antibiotics via intellectual property laws. And
even so, the lab-grown meat will soon make the problem far less
severe, since around 80% of antibiotics these days are given to
farm animals. And you can do your part now by going
vegetarian.". That sounds so appealing, people want to believe that so
much that they will not do research (such as looking up the
statistics on where exactly antibiotics used in agriculture go...)
to realize that's nonsense. There are a number of problems with
that response. Number one, there is no reason to think there even
are useful antibiotics left for scientists to discover. The fact
that we've found dozens of chemical compounds that kill
prokaryotes but not eukaryotes is already amazing, we cannot
expect that there are more. And lab-grown meat won't address the
problem of superbacteria because almost all antibiotics used in
agriculture these days are being used in the egg industry, and we
won't have lab-grown eggs any time soon. We struggle to produce
muscle meat in laboratories, and eggs are far more complicated
than muscles. But to a layman, the anarchist response will
probably sound very appealing.