Internal Implementation Disclosure is the process whereby your web application leaks information about the software being run, the server technology and operating system to a malicious hacker.
We will be considering several concepts under this topic that helps attackers build up useful profiles about your website.
- Server response header disclosure
- Locating vulnerabilities based on response headers
- Disclosure via robots.txt
- The risk in HTML source
- Locating at-risk websites
How an attacker builds a website risk profile
Imagine a bank robber, he tries to understand every point of risk he can find within and outside the bank. Same goes for an attacker on the website. The attacker wants to understand the libraries and frameworks, he wants to see the HTML source and the structure of the file (inline SQL statements, hidden fields in HTML source, comments in your HTML fields).
He is trying to locate every internal error message ( an attacker will try to cause exception, unhandled exceptions logs, web server logs and all that can give useful information. Is there any useful data in query strings?
We will consider the national vulnerability database that can be found here (National Institute of Standard and Technology (NIST). It is useful for attackers to narrow down their risk profile
Test website for this article is an insecure website which has been deliberately built by Troy Hunt for test purposes and can be found here
Locating vulnerabilities based on response headers
When a user makes a request, response headers are included with the data being sent back to the client to instruct the browser to do something or for informational purposes. Most actionable response headers are generated by the Web server itself. These include instructions for the client to cache the content (or not), content language, and the HTTTP request status code among others.
We can inspect the technology stack using the response header here by right-clicking and inspect the test website specified above.
Go to network tabs, refresh the browser again and click on the first response.
It shows that the website underlying technology relies on ASP.NET MVC and also the X-AspNet-Version. These are information an attacker starts to build upon.
www.shodanhq.com is a search website for devices on the internet. Shodan is the search engine for everything on the internet. While Google and other search engines index only the web, Shodan indexes pretty much everything else — webcams, water treatment facilities, yachts, medical devices, traffic lights, wind turbines, license plate readers, smart TVs, refrigerators, anything and everything you could possibly imagine that’s plugged into the internet (and often shouldn’t be).
Web server versions (e.g Asp 4.0.30319, Apache 1.3.23) can be searched on this website. We can narrow down search filters to countries and find devices running on that web services on the internet. Most secure web services block shodanhq and similar providers from crawling their service.
Http fingerprinting of servers
Trying to find a vulnerable website might be more than just inspecting response headers. We can use a concept called HTTP fingerprinting, imagine the real world concept of fingerprinting and trying to identify the implicit identity of a person. We can check more implicit things on a website that allows us to build an attacker profile. Here we’ll be using fiddler.
What is Fiddler?
Fiddler is a web debugging proxy for any browser, any application, any process. Log and inspect all HTTP(S) traffic between your computer and the Internet, mock requests, and diagnose network issues. Fiddler is available for macOS, Windows, and Linux and you can download it on their website here
Image source: https://www.telerik.com/fiddler
So, In a bid to build our attacker profile, let us try the concept of http fingerprinting on this website. We can go to fiddler and request this website, then we can inspect the response headers(raw)
We are majorly interested in the order the server (nginx) and the date is returned. The main point of this is that it helps us to narrow down our attacker profile to the technology behind this website. By checking another website, we see that the date header is returned last while the server header is returned long before it.
Fiddler response for: https://overthewire.org/wargames/
Well, this still has not told us anything spectacular, so let us try and narrow down this information by sending a malformed request. We attempt to change the http version from http/1.1 to http/x.1. This returns a 400 error for both requests but with different error messages. The response of these websites to http verbs such as delete, trace, search can help us to build an attacker profile for these websites. The whole concept is called http fingerprinting. There are other giveaways such as hidden fields, use states.
This whole article is focused on how an attacker can begin to build a profile, identify weak points on a website. Another useful resource for attackers is the robots.txt.
What is a robots.txt file?
Robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website. The robots.txt file is part of the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users. The REP also includes directives like meta robots, as well as page-, subdirectory-, or site-wide instructions for how search engines should treat links (such as “follow” or “nofollow”).
In practice, robots.txt files indicate whether certain user agents (web-crawling software) can or cannot crawl parts of a website. These crawl instructions are specified by “disallowing” or “allowing” the behavior of certain (or all) user agents.
The basic format of a robots.txt file is
User-agent: [user-agent name]Disallow: [URL string not to be crawled]
How does this robots.txt file helps attackers? Obviously, it provides an attacker with useful information about the parts of a website the owner doesn’t want you to see. Such areas includes admin-only, private territories on a website. Hence, a robots.txt file provides insights on the areas you are hiding from an attacker
We can check non-disallowed routes on a website by including /robots.txt to the end of the web address. Take for example www.nairaland.com. www.nairaland.com/robots.txt returns a list of non-allowed routes on the website
The disallow words tell web crawlers not to crawl these url paths on the websites. But the whole point of this is that it also tells attackers that these url paths exist and may contain useful information. These subtle details help an attacker to build his profile little by little.
You can find useful information on how to hide robots.txt file on your website and prevent robots.txt attacks here
The risks in html source
Right clicking and viewing the page source seems to be the starting point of any developer. They all know how to do this. But do you know that this page can reveal information that makes you vulnerable to attackers?
Navigate to hackyourselffirst.troyhunt.com/robots.txt which shows you the list of disallowed paths.
Let us go to hackyourselffirst.troyhunt.com/secret/admin which is part of the url the robotx.txt file exposes to us.
Even though we were denied access, we can still view page source on this website and looking through the HTML code, we see that comments have been left in between the HTML code which tells us how to download the database file.
Although this seems like a simple example but it does happen. Leaving information such as libraries used in your project as we can see here (jquery and bootstrap) or even leaving sensitive information in hidden fields makes your website vulnerable. These data or parameters can be tampered with using a concept called parameter tampering.
The goal of this article is not to hack or bring down any website. It is to show you different process attackers tends to build profiles, identify vulnerable points on web devices.
We spoke about several useful concepts such as locating at-risk websites, http fingerprinting, disclosures via robots.txt etc.
This is just a very tiny part of what attackers do in order to hack you. Several other concepts to consider includes:
- SQLl Injection
- Cross Site Attacks
- Parameter Tampering
- Cross Site Scripting
If you think some of these things are no longer relevant these days due to the advent of frameworks and tools which tends to secure against these, I’ll advise you to google yearly attack metrics due to each vulnerability
- Hack Yourself First: How to go on the Cyber-Offense. Troy Hunt (Pluralsight Course)