posted on Aug, 31 2007 @ 10:59 PM
I figured this would make an good first post here. I have a way with computers and tend to look around when i can. One way to do this over the net
is to look for a file called robots.txt. Many url's have them, many don't. The original purpose of robots.txt was to list directories that web
spiders and search engines are not allowed to index. So I look for robots.txt when i'm feeling up to it, find some disallowed directories, type them
in the url and see if they can be accessed. I eventually find my way to the Food and Drug Administration, FDA.GOV. I found they do have a robots.txt
file and I was bothered by one particular entry. Below is the full contents of this file:
#robots.txt file for
www.fda.gov...
#Added for Bristol-Myers on Sept 2005
User-agent: vspider
Disallow: /
#For all other crawlers
User-agent: *
Disallow: /.
Disallow: /data/
Disallow: /binn/
Disallow: /cder/test/
Disallow: /opacom/area51/
Disallow: /oashi/aids/listserv/
Disallow: /cdrh/ftparea/cdrh/MDR/coll/mdr/mdrcoll/
Disallow: /foi/warning_letters/d1371b.pdf
Disallow: /foi/warning_letters/archive/
Hit-rate: 30 # wait 30 seconds before starting a new URL request default=30
Visiting-hours: 23:00EDT-05:00EDT #index this site between 11PM - 5AM EDT
Concurrent-hits: 2 # limit concurrent active URLS to 2 for each index server
Ok, see the fifth disallow? /opacom/area51/? Any ideas on this? What is opacom? Why the FDA? If you try to access said directory, via
ww.fda.gov/opacom/area51/ you get Forbidden directory warning. This is the first place I have brought this information to.
To see the file yourself:
www.fda.gov/robots.txt
And the forbidden warning:
www.fda.gov/opacom/area51/
Any thoughts?
~odievk
[edit on 31-8-2007 by odievk]