Deep Web 101

I have recently taken up an interest in revisiting and exploring the “Deep Web” (a.k.a. DarkNet / UnderNet / Hidden Net) - which are just other names for the invisible chunk of the web - the place where no general purpose search engine will willingly take you. But what if you want to be taken there? Most people do after they consider that it composes an estimated 95% of the World Wide Web (just imagine how much unique information is stored there!!). During my exploration, it really struck me as a curious mixture of “the place to be” and “the place to stay away from” - all wrapped in one. Obviously, if you are keen to explore it, neither of those approaches will do - unless you are a fearless hacker ;). Throughout this post, I’ll assume you are not necessarily one of those mystical creatures.

For starters, let’s look at what the Deep Web really is. According to Wikipedia¹, it is “content that is not part of the surface web”, where “surface web” is the chunk that is indexed by standard search engines. Considering that only about 0.03% of the pages get indexed (that is 1 in 3000), it becomes evident that for a comprehensive information retrieval, just a surface search may not suffice.

Why are certain pages not indexed? Very good question! It turns out that this occurs for many possible reasons. It is usually due to technical barriers, which may or not have been placed there deliberately by the site’s owner. Those barriers prevent the “web crawlers” (a.k.a “web spiders” / “robots”, etc.) from accessing the content.

One such barrier is dynamic content - what you find on sites that do not have static pages but generate the content based on a query (scripted pages) or sites that serve content based on the user’s identity (password-protected pages). Since search engines can’t enter keywords or passwords (or CAPTCHAs), they end up ignoring such content altogether. Another barrier is a site’s ‘robots.txt’ file which gives the site’s admin the power to prevent all or certain “robots” from accessing the content by specifying the permissions.

Food for Thought:

Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web.
The deep Web contains 7,500 terabytes of information compared to 19 terabytes of information in the surface Web.
The deep Web contains nearly 550 billion individual documents compared to the 1 billion of the surface Web.
More than 200,000 deep Web sites presently exist.
Sixty of the largest deep-Web sites collectively contain about 750 terabytes of information — sufficient by themselves to exceed the size of the surface Web forty times.
The deep Web is the largest growing category of new information on the Internet.
Deep Web sites tend to be narrower, with deeper content, than conventional surface sites.
Total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface Web.
Deep Web content is highly relevant to every information need, market, and domain.
More than half of the deep Web content resides in topic-specific databases.
A full ninety-five percent of the deep Web is publicly accessible information — not subject to fees or subscriptions.

The Deep Web is a place where they use bitcoins as currency, and where any kind of information can be found - be it revolution plans, crazy scientific experiments, underground fighting tournaments. You can come across the military, the police, kidnappers, scientists, terrorists and much, much more. In the words of a fellow DW explorer, “the party there goes across the entire moral spectrum”.

Basically we are talking about information hidden in plain sight - to find it you generally need to know where and how to look. Sometimes, you can use custom, “Deep Search Engines” to find hidden content - like the one you will find at ahmia.fi, but those don’t index a very big chunk of the pages. At the time of writing, ahmia was indexing exactly 1003 hidden web sites - which, you will agree, is not much at all.

If you have navigated to that address already (it does open in regular browsers), you might have noticed that it is indeed a search engine - with the title “Tor Hidden Service (‘Onion’) Search”. Let’s look at what those terms mean.

Tor is one of those recursive acronyms - very popular choice in the tech community - and it stands for “Tor is the onion router”.

Onion routing is, according to Wikipedia², a “technique for anonymous communication over a computer network”. Data is repeatedly encrypted and sent through several network nodes called onion routers. Like someone peeling an onion, each onion router removes a layer of encryption to uncover routing instructions, and sends the message to the next node, where the process is repeated.

Those are the most recurrent terms in my exploration of the DW, because they are at the very heart of the system. To access a site which uses the Tor network for anonymity, you first need to have all the necessary software on your machine (usually, a [Tor browser bundle]³). You can also use a proxy service like Tor2Web, which can be accessed from regular (not Tor-aware) browsers and search engines (basically giving up all anonymity) — but remember, much of the traffic is anonymized for a reason.

A useful list with warnings and precautions can be found [here]³.

Key Safety Tips

Avoid Using Plugins: They can be manipulated to identify you.
Use HTTPS: Where possible, for additional encryption.
Avoid Personal Accounts: Don’t access personal emails, social media, etc.
Be Careful with Downloads: Always review offline and in secure environments.

In an onion routing network, more nodes improve privacy.

Beginning Your Exploration

With Tor correctly set up, you can start exploring the deep web’s hidden resources. Remember that anonymity can sometimes lead to (total?) anarchy; exercise discretion when exploring.

An informative video on the Deep Web can be found [here][^5]. It’s a great idea to watch it before diving deeper.

Useful Links (For Tor Browser Only)

Ensure you’ve read all warnings before using these links:

The Hidden Wiki: http://kpvz7ki2v5agwt35.onion/wiki/index.php/Main_Page - Find a wide range of resources.
Ahmia: http://ahmia.fi - A search engine for deep web sites.

Final Thoughts

Personally, I explore tech forums and libraries on the deep web for invaluable information. Always prioritize safety in your explorations.

If you have any questions or suggestions, feel free to get in touch!