reply to post by hadriana
Out of curiosity, which of the roots are you able to identify most conclusively?
I have the transcribed Voynich Manuscript in my computer and have run a few scans on it for fun.
It's claimed that the manuscript was written in two different "languages" or "encodings", so it's often split into parts A and B.
I've extracted some stats for VM-A and VM-B and compared them with the corresponding stats for portions of different texts such as:
- Pliny's natural history (latin)
- Bible (latin)
- Bible (hebrew)
What stands out in the VM text is the excessive repetition of some word fragments. For example, looking at the most common fragments composed of three
characters, we get the following - where the first row lists the most popular three-letter fragments, the second row lists the occurrence, and the
third row lists the corresponding occurrence percentages - relative to the total number of three-letter fragments in the document.
VM-A
cho, che, iin, aii, heo
539, 360, 302, 294, 250
4.0%, 3.0%, 2.0%, 2.0%, 2.0%
VM-B
che, she, aii, edy, iin
898, 443, 410, 401, 389
4.0%, 2.0%, 2.0%, 2.0%, 2.0%
BBL LAT
que, ent, ere, tur, eri
314, 301, 215, 189, 184
1.0%, 1.0%, 1.0%, 1.0%, 1.0%
BBL HEB
zjf, njy, fqj, nej, mla
62, 47, 41, 36, 34
0.0%, 0.0%, 0.0%, 0.0%, 0.0%
PLINY
ent, que, tur, ant, bus
505, 454, 454, 338, 312
1.0%, 1.0%, 1.0%, 1.0%, 1.0%
As you can see the Voynich Manuscript contains alot more repetition than the other texts - this seems to rule out Hebrew
though Latin is certainly
a candidate - even if it is more moderate in this aspect.
Here's another analysis, for each character in the document's alphabet, of how often each character occurs as the first letter in a word:
VM-A
c, o, s, q, d, y, k, t, p, a, l, e, f, r, m, i, g, z, n
702, 688, 396, 357, 328, 301, 168, 153, 92, 67, 47, 39, 38, 37, 3, 2, 2, 1, 1
20.0%, 20.0%, 11.0%, 10.0%, 10.0%, 9.0%, 5.0%, 4.0%, 3.0%, 2.0%, 1.0%, 1.0%, 1.0%, 1.0%, 0.0%, 0.0%, 0.0%, 0.0%,
0.0%
VM-B
o, c, q, s, l, y, d, t, p, a, k, r, f, e, i, x, g, h, m
1018, 723, 602, 527, 342, 334, 311, 233, 220, 207, 197, 110, 51, 38, 10, 6, 4, 1, 1
21.0%, 15.0%, 12.0%, 11.0%, 7.0%, 7.0%, 6.0%, 5.0%, 4.0%, 4.0%, 4.0%, 2.0%, 1.0%, 1.0%, 0.0%, 0.0%, 0.0%, 0.0%,
0.0%
BBL LAT
p, a, c, s, i, d, v, m, f, e, t, r, o, n, h, l, b, g, u,
q, z
592, 584, 534, 519, 425, 356, 340, 339, 266, 261, 244, 230, 187, 166, 155, 154, 100, 92, 82,
78, 9
10.0%, 10.0%, 9.0%, 9.0%, 7.0%, 6.0%, 6.0%, 6.0%, 5.0%, 5.0%, 4.0%, 4.0%, 3.0%, 3.0%, 3.0%, 3.0%, 2.0%, 2.0%, 1.0%,
1.0%, 0.0%
BBL HEB
n, e, f, j, $, y, k, p, m, d, b, a, z, s, h, t, x, v, i,
g, r, c
719, 707, 692, 488, 431, 299, 274, 272, 219, 140, 127, 115, 97, 93, 90, 61, 57, 43, 34,
19, 17, 11
14.0%, 14.0%, 14.0%, 10.0%, 9.0%, 6.0%, 5.0%, 5.0%, 4.0%, 3.0%, 3.0%, 2.0%, 2.0%, 2.0%, 2.0%, 1.0%, 1.0%, 1.0%, 1.0%,
0.0%, 0.0%, 0.0%
PLINY
c, a, p, s, i, m, d, e, t, l, v, f, r, n, h, o, g, u, b,
q, x, z, w
1158, 975, 960, 880, 633, 577, 532, 462, 450, 391, 389, 386, 362, 276, 252, 245, 212, 161, 146,
122, 17, 9, 2
12.0%, 10.0%, 10.0%, 9.0%, 7.0%, 6.0%, 6.0%, 5.0%, 5.0%, 4.0%, 4.0%, 4.0%, 4.0%, 3.0%, 3.0%, 3.0%, 2.0%, 2.0%, 2.0%,
1.0%, 0.0%, 0.0%, 0.0%
Note that for Hebrew, i've replaced the characters to use the English alphabet - since we're only looking at stats.
The possible use of anagrams can account for certain patterns, such as the consecutive use of the same letter more than two times which happens often
enough. However, if this is the case, then there must have been some deterministic process by which the resulting word is generated, otherwise even
the author would have a hard time decoding.