Affinis - Subdomain Discovery Through RNN (Recurrent Neural Network)

Credits: James Barnett

Affinis - Subdomain Discovery Through RNN (Recurrent Neural Network)

Project Repository: JetP1ane/Affinis (github.com)

It is said all too often, but Reconnaissance is such a critical part of a successful penetration test, especially when it comes to subdomain discovery. We never want to miss those vulnerable assets that the client may fail to disclose or just frankly doesn’t know about like stale assets. There are hundreds of tools available to assist with the Recon process, but I wanted to contribute something that would help narrow those chances of missing something during this process. So, I started looking into Neural Networks with the hope that this technology may be applicable to helping to solve this issue, or at least to help minimize it.

The way I wanted to leverage neural networks for this solution was to feed it a list of subdomains discovered from traditional methods and have it generate new subdomains based on that input data. Instead of continuing to try and brute-force subdomains with pre-made lists, I wanted the neural network to make a more educated decision on what those next guesses should be based on already known samples. My research eventually led me to RNN (Recurrent Neural Networks) in conjunction with NLP (Natural Language Processing).

RNN’s & NLP

RNN’s (Recurrent Neural Networks) excel at solving sequence-based problems and utilize a ‘memory’ to learn from previous iterations. NLP (Natural Language Processing) is the computer science sub-field used for machine text classification and text generation, just like those auto-filled words in your Google search or Siri on your iPhone. RNN’s help to solve NLP related problems due to its ability to store and recall information from a data stream. The RNN is essentially emulating a human brain and learning speech patterns based on the textual data it is fed. As it interprets this data, it is learning the structure and style of the language and can then generate predictive text based on that input. I wanted this to function as a subdomain name generator, so this seemed like the perfect solution. RNN

I highly recommend reading more into the relationship between RNN’s and NLP, but I want to get back to my main point. I wanted to implement a programmatic solution utilizing RNN’s, more specifically the LSTM algorithm, to solve my subdomain name generator problem.

Affinis - POC

Python and the Keras API became the tools of choice based on my reading from this documentation. Affinis takes a list of subdomains generated from passive and active tools and formulates its own list of potential subdomains that the target may be using based off the ones that it already knows about. It makes a very educated prediction on what style and structure the target may be using to name their subdomains. It is by no means perfect and may not be of any use in certain situations, but I have been fortunate to have success with it on real engagements. It has found obscure subdomains for me that were never found with traditional passive and active subdomain discovery tooling. Affinis in action

I believe neural networks can provide tremendous value to the infosec industry in solving a plethora of problems. I will be continuing to evolve this solution and it will be eventually integrated into a larger automated reconnaissance framework, but for now it exists as a standalone Python app. I hope it provides value to some and I’m always open to hear feedback and improvement suggestions.

Project Repository: JetP1ane/Affinis (github.com)