💀 Scuwl 💀
Simple custom wordlist generator
Scuwl (skull) is a Python CLI program that quickly and easily generates a wordlist from a webpage. The idea for Scuwl was inspired by the program Cewl. Scuwl defaults to a crawling depth of zero and most webpages return a wordlist in less than a second. Using a crawling depth of one generally takes a few minutes.
Scuwl is fast because it recursively scrapes websites asynchronously. Scuwl minimizes its memory footprint by processing HTML as it goes and updating the wordlist in memory as a set. By default Scuwl keeps unique words, three characters long and over, and removes all punctuation.
Note: Using a crawling depth of over one remains untested.
Features
- Fast recursive asynchronous web requests using aiohttp
- CLI options gives you control over the generated wordlist
- Simple Python codebase (< 150 lines)
- Low memory usage (~80MB)
Installation
python -m pip install scuwl
Usage
$ scuwl -h
usage: scuwl.py [-h] [-d DEPTH] [-H HEADERS] [-m MIN_LENGTH] [-o OUTFILE]
[-P PROXY] [-p] [-u USER_AGENT] [-v]
url
💀SCuWl💀, Simple custom wordlist generator.
positional arguments:
url url to scrape
options:
-h, --help show this help message and exit
-d DEPTH, --depth DEPTH
depth of search
-H HEADERS, --headers HEADERS
json headers for client
-m MIN_LENGTH, --min-length MIN_LENGTH
minimum length of words to keep
-o OUTFILE, --outfile OUTFILE
outfile for wordlist
-P PROXY, --proxy PROXY
proxy address for client
-p, --punctuation keep punctutation
-u USER_AGENT, --user-agent USER_AGENT
user-agent string for client
-v, --version show program's version number and exit
Examples
Generate wordlist and send to stdout
$ scuwl https://github.com/petebuffon/scuwl
topics
out
scuwl
2022
track
...
Generate wordlist and save as wordlist.txt
$ scuwl -o wordlist.txt https://github.com/petebuffon/scuwl
$ wc -l wordlist.txt
122 wordlist.txt
Keep punctuation
$ scuwl -p -o wordlist.txt https://github.com/petebuffon/scuwl
$ head wordlist.txt
customer
wait?
write
devops
user
Use a crawl depth of one (scrapes all links from input webpage)
$ scuwl -d 1 -o wordlist.txt https://github.com/petebuffon/scuwl
$ wc -l wordlist.txt
6326 wordlist.txt