It looks like you're using an Ad Blocker.

Please white-list or disable AboveTopSecret.com in your ad-blocking tool.

Thank you.

 

Some features of ATS will be disabled while you continue to use an ad-blocker.

 

Hero FBI plows over 650,000 emails in eight days, Investigation Complete

page: 3
31
<< 1  2   >>

log in

join
share:

posted on Nov, 7 2016 @ 05:29 PM
link   
a reply to: mclarenmp4

Sounds like you have no idea what you are talking about. This is not IT forensics, you are not searching for specific data in the swarm of information in the same sense you search for a specific file in your OS. Here you are looking or at least should be looking at all the data and be mindful of the substitute keywords that people use for secret communication that actually violates the laws but are hidden in plain sight from the alert keyword queries.

In other words, while you may or may not find the data you seek, you just missed 99% of all the other data. Kapish? This way, your way, your success depends entirely on your ability what to look for. So if you don`t know what you are looking for, it`s a dead end on your part.

EDIT to add: No wonder that they haven`t found anything because their approach was doomed at the very beginning if they used the same concept you proposed.
edit on 7-11-2016 by Op3nM1nd3d because: (no reason given)



posted on Nov, 7 2016 @ 05:52 PM
link   
a reply to: Op3nM1nd3d



So if you don`t know what you are looking for, it`s a dead end on your part.


They knew what they were looking for. Classified information which had been sent to or from the Clinton server.

That's the case under investigation.



posted on Nov, 7 2016 @ 05:59 PM
link   
a reply to: Op3nM1nd3d

Your missing the point, maybe I didn't explain it clearly.
The information they were looking for was only in relation to mishandling of classified documents and whether there was intent to break the law.
They aren't looking for spooky coded information linked to child trafficking rings, they are looking for specific information relating to the misuse of classified documents.
Once they retrieve ALL the data they require, they then hand those emails over to the investigators who then see if there was any intent to mishandle classified information. Out of 650,000 emails how many were from HC and how many of those may have been classified?

The methods I detailed would allow the FBI to get all the information, deleted or not within a few hours. You then have all the emails that are related to classified information which would then be handed over to investigators who spend the next week going through probably a small amount of emails and hey presto we are where we are.
Kapish?



posted on Nov, 7 2016 @ 07:43 PM
link   

originally posted by: Phage
a reply to: Op3nM1nd3d



So if you don`t know what you are looking for, it`s a dead end on your part.


They knew what they were looking for. Classified information which had been sent to or from the Clinton server.

That's the case under investigation.
You are right at the core of the question now. If documents can be scanned within one second to determine if there is any classified information on the document, then FOIA requests should all be completed at a rate of about one second per document requested. No?

It would seem to me that the speed of determining what information is classified for the purpose of an FOIA request, a process that typically takes months or years, should be comparable to that of searching an email database to determine whether there is any information in the documents pertinent to national security.

When I submit my FOIA requests I will go ahead and mention the 1-second per document standard set by Hillary Clinton's allies in office and tell them to use the same software that scanned those emails to determine if there was classified information in them for seemingly impossible speed of review. I imagine that I will be met with a response that my expectations are laughable.



posted on Nov, 7 2016 @ 09:29 PM
link   
if they are satisfied that there is no more classified info in any of the e-mails they should release all the e-mails for public consumption,but they won`t do that because they aren`t %100 sure that they didn`t miss any e-mails that contain classified info.
the only way they could be sure is if they had humans read every e-mail and they won`t do that either.
edit on 7-11-2016 by Tardacus because: (no reason given)



posted on Nov, 8 2016 @ 05:49 AM
link   
a reply to: Phage



They knew what they were looking for. Classified information which had been sent to or from the Clinton server.

That's the case under investigation.


But how do they know where to look for if people spoke in codes delivering the same message that is regarded classified information? You simply cannot do that with search filters.


a reply to: Tardacus


if they are satisfied that there is no more classified info in any of the e-mails they should release all the e-mails for public consumption,but they won`t do that because they aren`t %100 sure that they didn`t miss any e-mails that contain classified info. the only way they could be sure is if they had humans read every e-mail and they won`t do that either.

My thoughts exactly



a reply to: mclarenmp4

Let me rephrase my statement.

'So if you don`t know what you are looking for, apart from the obvious keywords(which indicate misuse or violation or anything related to classified information) that everyone knows about and anyone with the IQ above 80 knows how to avoid, it`s a dead end on your part.'

If you are still not satisfied, read everything above, again...kapish?



edit on 8-11-2016 by Op3nM1nd3d because: (no reason given)



posted on Nov, 8 2016 @ 10:54 AM
link   

originally posted by: Op3nM1nd3d
a reply to: mclarenmp4

Sounds like you have no idea what you are talking about. This is not IT forensics, you are not searching for specific data in the swarm of information in the same sense you search for a specific file in your OS.

E-Discovery (what you're talking about) and digital forensics mostly follow similar steps. Digital forensics generally requires a higher level of expertise, though E-Discovery has its own tricks and traps. Both big fields though. Both in IT. Lot of digital forensics shops use the same staff to do both.

As far as the amount of emails involved, there is a pile of information missed out in the maths.

E-Discovery work flow is roughly:
1. Collect data
2. Ingest data into platform
3. Process and index data to be search whilst removing out of scope / irrelevant content
4. Load remaining data into review platform and distribute to review users

Processing considerations:
1. Message threading: an email conversation involving nine emails becomes one email for review. Not nine. Attachments included in email four or five will become 'child items' after data processing so it's still one email and two attachments for review.
2. Deduplication and near deduplication: often emails containing the same or incredibly similar body text will be received. This will include news items, system FYIs, office chatter, spam, and all the rest. Giant swathes of data can be removed in this way. This will also pick up message threads that have failed.
3. Batch discoveries: often batches of content with similar traits will be discovered, and most E-Discovery systems index the entire email for search purposes. This means once a pattern is discovered rules can be very quickly written to tag or remove those emails from the system. Can include IP addresses, domains, key words, dates, times, results of searches, or whatever else you can think up ...
4. Hash / common data removal: Kind of an extension of 3, but common databases of known files and content are maintained. You run this across your dataset and it will recognize content that has already been reviewed in other investigations. This can include newsletters on seminars, Bono's one campaign, known malicious email, and other 'safe' content to remove.

There are other parts to the above, but it gives an idea. The human element...

In general terms, the review platform has a pane in the middle with the content to be reviewed. To the right is a multiple choice form to flag the content as relevant or not, and a comment box. If there are multiple criteria there will be more check boxes etc ... Each reviewer checks out a 'batch' of emails to review.

The interface has hotkeys. You hit left or right on your keyboard, alt-1 to alt-5 for check boxes, and press space when you want a new one. Short emails will take a second or sometimes less. Generally this work is given to interns, graduates, and whoever else has free time. Access is remote so several hundred staff members can take part at once.

Most emails are removed at stage one review. They will only be seen again during random audits of review batches. A manager will review batches from each user to ensure they're following the rules. Stage two includes anything tagged as relevant, classified, or whatever other factors are included. Typically stage two includes legal reviews of relevant data and reviews of data flagged as 'unsure'.

I have never worked on a case in my life that has required over one second per an email to be reviewed other than that one time where we discovered a mailbox with only three emails in it and they were all bad.

Hope that helps.
edit on 8-11-2016 by Pinke because: (no reason given)



posted on Nov, 8 2016 @ 05:33 PM
link   
a reply to: fractal5

I'm replying just to bump this up... it was posted before the power-trip counterthread.



posted on Nov, 9 2016 @ 08:12 AM
link   
a reply to: Pinke

It does. Now if you could explain the weaknesses here for other members, that would be great.

1. The algorithm that only selects 'relevant data' to be reviewed. (For example in deduplication where sometimes spam is not spam but is omitted anyway or a batch discovery that recognises a pattern but then dismiss it as irrelevant content even if it`s not and is spoken outside bot capturing sensors.)

2. The human element...read below



Generally this work is given to interns, graduates, and whoever else has free time. Access is remote so several hundred staff members can take part at once.


That`s just a scratch of the surface. You also have to account for a human mistake that happens to everyone, then dilligence of certain people and even people who are influenced to look away...I could go on but you get the picture.

Also one other thing.



1. Message threading: an email conversation involving nine emails becomes one email for review. Not nine. Attachments included in email four or five will become 'child items' after data processing so it's still one email and two attachments for review.

I have never worked on a case in my life that has required over one second per an email to be reviewed


So you are saying that you are able to read and review an email conversation that consists of multiple emails in one second? Are you a robot? Or you simply trust the robot to do it right for you? Cause in 1 sec, there can be no human element involved. I`m just curious how you define a review?

Thanks



posted on Nov, 9 2016 @ 08:15 AM
link   
a reply to: fractal5

It's almost as if a cynical person could assume they didn't really want or try to investigate the crimes...now the election is over, perhaps they'll redouble their efforts under a new director.



posted on Nov, 10 2016 @ 09:42 AM
link   

originally posted by: Op3nM1nd3d
a reply to: Pinke

It does. Now if you could explain the weaknesses here for other members, that would be great.

1. The algorithm that only selects 'relevant data' to be reviewed.

There is no single algorithm.



(For example in deduplication where sometimes spam is not spam but is omitted anyway or a batch discovery that recognises a pattern but then dismiss it as irrelevant content even if it`s not and is spoken outside bot capturing sensors.)

This is not what deduplication does.

There is an astronomically low outlier where an MD5 hash could be duplicated between two files but the chance of it happening is so statistically low that it's not worth discussing.


Deduplication takes content which is mathematically the same and removes the excess. Near deduplication takes duplicate content within certain criteria and removes it much like fuzzy hashing. This is often related to the body content of an email because the headers would alter the MD5 hash. Unless your fraudsters were communicating via a unique language of best buy adverts then it's not relevant.

Even near deduplication that involves fairly high levels of error would only be a severe risk when people talk to each other like used car sales people and try to sell stocks and gold reserves to each other on the regular.




That`s just a scratch of the surface. You also have to account for a human mistake that happens to everyone, then dilligence of certain people and even people who are influenced to look away...I could go on but you get the picture.

You can't be unhappy when a robot reviews data then be unhappy when a human does it too.

Robot + human = efficiency. End of story.

This is why you review people's review batches. A long term forensic investigator or agent won't get anywhere if they constantly show up bad under review.


So you are saying that you are able to read and review an email conversation that consists of multiple emails in one second?

No, but you can give me 300 emails involving 20 threaded conversations and I can mathematically decrease the work load to read 123 documents instead of 300 then I can remove the spam and other stuff I've seen before so I only need to read 97 unique documents.


Are you a robot?

Yes, but this is unrelated to our discussion, so don't be racist.


Or you simply trust the robot to do it right for you? Cause in 1 sec, there can be no human element involved. I`m just curious how you define a review?

If I gave you an excel spreadsheet with 20, 000 rows and wrote the word banana in each one and asked you to spell check would you:

1. Write a program or excel formula to check my spelling whilst evaluating what the code does?
2. Run spell checker and hope for the best
3. Read every single row one by one
4. Do nothing because you don't trust humans and spell checker is a dastardly robot that can't be trusted

Based on your logic we could not choose 1 or 2, so we're going to have to read every row.




top topics



 
31
<< 1  2   >>

log in

join