Originally posted by Razimus
People make fun of the fact that I wasted my time, yet at the same time ask me to waste more of my time.
You shouldn't be taking this personally. It's feedback.
You seem very interested in getting a large audience but not in giving people what they want to warrant having that audience. (Or acknowledging that
you're not the only one with an IQ)
ATS has a large number of varied persons who are resources for someone like you when you aren't dismissing them like school children. You asked a
dozen people about the word 'infrastructure' ... what about the
thousands of people on ATS? You want an expert, why didn't you ask ATS before
starting??? We're awesome!
Originally posted by Razimus
I guess I have somewhat of a calculator in my head, that tells me it's mathematically very improbable for these 2 sets of text to be owned by anyone
other than the same individual. And I guess the short attention span-prone don't have this 'calculator' in their head.
I have severe ADHD, I haven't lodged a casio calculator in my skull, but my gangnam style is pretty great.
The following is an example report only. Report may have sharp edges. Report is not suitable to eat. Please do not lick the report.
Why You Are Wrong: A report by Pinke
I am using three documents:
My datasets:
doc1.txt John Titor's text dump (Note, may contain google drive tags, but a small amount)
doc2.txt Haber's text dump (Note, may contain google drive tags, but a small amount)
doc2b.txt Haber's text dump with characters removed to reduce size (Should have deleted more)
example.txt krebsonsecurity blog, random selection of 113821 characters. (Should have got more)
My datasets were badly chosen and all the wrong sizes, and I lazily attempted to make them a similar size but couldn't be bothered; even then it
shouldn't effect my findings.
My method:
I ran queries looking for varied sized
n-grams (2 - 6 word phrases) and compared the results, including
the uses of various terms within all docs. I also used the word list provided by Hoaxhunter.
Things I did:
This is a concordance plot of the n-gram
great deal:
It's a favorite term by John Titor. It is completely absent from even the larger datasets. It is one of the few n-grams that can be used in enough
contexts that I would expect to see it in more than one of the datasets if Titor was the author.
This is a concordance plot of the n-gram
snapshot:
Despite the example dataset being smaller, it has an occurrence of this term. This is one of the alleged 'uncommon' words.
Concordance plot of the n-gram
infrastructure:
The term doesn't appear to be used any more or less in comparison to the other datasets, except the Morey Haber writings seems to be very much based
around this topic in some respects.
Concordance plot of the n-gram
inherent:
While this is a key part of the example text, it is not a key part of either of the other texts, but it does occur. It is not uncommon.
In fact, the vast majority of alleged 'uncommon' terms only show up in one instance between documents. The alleged six word terms do not even show up
in the smaller datasets.
Overall Concordances
Sharing the number of exact phrases as seen in this comparative study is astounding
-- Hoaxhunter
The matches between the words and phrases of John Titor and Morey Haber is undeniable
-- Hoaxhunter
2 six word phrases out of over 23k unique possibilities (23966 approx)
5 five five word phrases out of over 24k unique possibilities (24523 approx)
61 four word phrases out of over 24k unique possibilities (24752 approx)
Over 400 three word phrases out of over 23k unique possibilities (23479 approx)
Some of the examples given by Hoaxhunter also overlap.
I can't agree with findings at this stage.
Conclusion:
Originally posted by sheepslayer247
Have you been able to find some specialists in the field to do a peer review on the material? That would go a long way in proving/disproving your
findings.
There really is nothing to peer review.
The majority of terms of deep interest only present tiny numbers of times, or are deeply contextual. Haber uses 'for more information' 29 times to
sell product, Titor uses it twice, but it is stated as 'relevant'. Unique terms which
should present such as 'a great deal' are not addressed
by Hoaxhunter unless I missed it?
The terms that do show up regularly such as 'based on' and 'the same' could easily be contextual. For example, if I ask you about the future, you may
find yourself saying well this bit is 'the same' a lot.
This doesn't prove that the writer isn't Morey Haber, but the content provided by Hoaxhunter certainly doesn't prove that it is.
Probably best to ask an expert how to do this before starting next time (plagiarism software isn't the way to go). Then maybe an expert comes instead
of a Pinke.
edit on 27-4-2013 by Pinke because: typos and such ; bit at end