Hashcat v7: Python Bridge in Practice

As part of penetration testing attack paths, we often encounter password hashes that we attempt to crack to expand our access. hashcat is our first choice of toolkit to crack these hashes. Usually we encounter well-known hash algorithms, but occasionally we find novel custom algorithms.
The hashcat project released v7 on August 1, 2025. There were many changes, as detailed in the extensive and informative release notes, but the one I’ll focus on in this blog post is the Python bridge. I’ll cover the current version at the time of writing, v7.1.2.
A quick summary: The Python bridge lets users take a Python implementation of a custom hash and use it from hashcat, taking advantage of hashcat’s candidate generation, parallelization, potfile support, and more. The big caveat is that the Python hash calculations run on the CPU, and there is no support for GPUs or other devices like FPGAs.
My thoughts after reading about this were “this sounds great! But how easy is the integration? And what is the performance like?” So I implemented some things, wrote some benchmarks, and published this post to share the results.
Writing a Python Plugin
For the rest of this post I will use the Bitwarden server hash algorithm, described in a prior post series. Knowledge of the algorithm isn’t important for this post, but having a familiarity with the hash structure, especially the salts, will help a lot. In this section I will describe how to implement that algorithm using only Python. In the following section I will go into some of the Python bridge implementation details.
The best starting point is the Python plugin development guide, which gives theoretical and practical information about the Python bridge.
The big implementation detail that surprised me is that hash digest line formats have strict requirements that prevent you from using digests you find in the wild. For example, a common hash line format found in Linux shadow files is md5crypt—$1$f8Dj$CylA5fgvpRAWfprOHyvUr1—but this digest would require preprocessing first. Due to implementation details, hash lines have to be in the format digest*salt.
With that out of the way, we can implement our hash. First, set up your local source code repository and make sure you can build hashcat, following BUILD.md. Once that is set up, copy Python/generic_hash_mp.py to a new file1 (if you are on Windows/macOS, replace mp with sp and 73000 with 72000 for the rest of this post, and expect slower results: explanation). To be clear, all pure Python plugins will use hash mode 73000/72000, regardless of the Python script.
We need to make 3 changes to the new script:
- Add imports and helper functions.
- Replace
ST_HASH. - Implement
calc_hash(...).
First, we copy the helper functions unchanged from the beginning of https://research.ivision.com/bitwarden-server-hashcat-plugin-2-hashcat.html:
import base64
def inner_hash(rounds1, email, password):
masterkey = hashlib.pbkdf2_hmac("sha256", password, email, rounds1)
masterpasswordhash = hashlib.pbkdf2_hmac("sha256", masterkey, password, 1)
return masterpasswordhash
def outer_hash(password, inner_rounds, outer_rounds, email, salt):
subkey = base64.b64encode(inner_hash(inner_rounds, email, password))
keyHash = hashlib.pbkdf2_hmac("sha256", subkey, salt, outer_rounds)
return keyHash
Second, we copy and rearrange the self-test (ST) hash from the same post:
# $bitwarden-server$1*10000*10000*bm9yZXBseUBoYXNoY2F0Lm5ldA==*0ANf7tTj2zwkl3/hf5Zohg==*5ynxBxAsJJ95vgEL/uDVcXseLbyIsHRGFdTXCVvbRlo=
# ^ ^ ^ ^ ^ ^ ^
# type | | | email/inner salt (base64) outer salt (base64) digest (base64)
# | | outer round count
# | inner round count
# version
ST_HASH = "5ynxBxAsJJ95vgEL/uDVcXseLbyIsHRGFdTXCVvbRlo=*10000$10000$bm9yZXBseUBoYXNoY2F0Lm5ldA==$0ANf7tTj2zwkl3/hf5Zohg=="
# ^ ^ ^ ^ ^
# digest (base64) | | email/inner salt (base64) outer salt (base64)
# | outer round count
# inner round count
Third and finally, we implement calc_hash:
def calc_hash(password: bytes, salt: dict) -> str:
inner_rounds, outer_rounds, email, outer_salt = hcshared.get_salt_buf(salt).split(b"$")
out_hash = outer_hash(password, int(inner_rounds), int(outer_rounds), base64.b64decode(email), base64.b64decode(outer_salt))
return base64.b64encode(out_hash).decode()
The rest of the code from Python/generic_hash_mp.py can stay the same. The full file can be found here.
We can test it out with -b, which will first run a self-test to make sure it calculates ST_HASH correctly and then benchmark a bunch of hashes:
aralls@serpentine hashcat $ ./hashcat -m 73000 --bridge-parameter1 Python/73000-bws.py -b
hashcat (v7.1.2-173-g81ba0c1a3) starting in benchmark mode
<snip>
Assimilation Bridge
===================
* Unit #01 -> #01: Python Interpreter (3.13.7 (main, Aug 15 2025, 12:34:02) [GCC 15.2.1 20250813])
<snip>
---------------------------------------------------------------------------------------
* Hash-Mode 73000 (Generic Hash [Bridged: Python Interpreter with GIL]) [Iterations: 1]
---------------------------------------------------------------------------------------
Speed.#*.........: 2710 H/s (0.07ms) @ Accel:128 Loops:1 Thr:32 Vec:1
And that’s it! In <15 minutes we’ve replicated dozens of hours of my work implementing the hash in OpenCL. But that wasn’t wasted time, because this Python version should be much slower than the accelerated hash, right? …right?
Benchmarks
Before getting into the numbers, a big disclaimer: benchmarking is tricky because it depends heavily on both the hardware and the software stack. Even the single-digit percent differences between Python and OpenCL implementations might be orders of magnitude different on your systems.
For my benchmarks, I compared an identical algorithm implemented in Python (mode 73000, shown previously in this post) and OpenCL (mode 23410). When possible, I tried using different OpenCL backends on the same hardware as well.
| host | device | backend | hash mode | result (H/s) | notes |
|---|---|---|---|---|---|
| A | NVIDIA GeForce RTX 4090 | CUDA | 23410 | 443000 | 3 |
| A | NVIDIA GeForce RTX 4090 | OpenCL (CUDA) | 23410 | 442800 | |
| C (container) | AMD Radeon RX 6800 | HIP | 23410 | 147800 | |
| C (container) | AMD Radeon RX 6800 | OpenCL (AMD) | 23410 | 147600 | |
| C (container) | AMD Radeon RX 6800 | OpenCL (mesa rusticl) | 23410 | 146700 | |
| C (alpine host) | AMD Radeon RX 6800 | OpenCL (mesa rusticl) | 23410 | 146300 | |
| A | NVIDIA GeForce GTX 1080 Ti | CUDA | 23410 | 91325 | |
| A | NVIDIA GeForce GTX 1080 Ti | OpenCL (CUDA) | 23410 | 83126 | |
| B | AMD Radeon graphics (iGPU of 6850U) | HIP | 23410 | 21117 | |
| B | AMD Radeon graphics (iGPU of 6850U) | OpenCL (AMD) | 23410 | 20326 | |
| D | Intel(R) Core(TM) i7-11800H | OpenCL (Intel) | 23410 | 12934 | |
| C (container) | AMD Ryzen 9 9950X | OpenCL (PoCL) | 23410 | 10042 | |
| C (container) | AMD Ryzen 9 9950X | Python | 73000 | 8646 | 1, 2 |
| C (alpine host) | AMD Ryzen 9 9950X | Python | 73000 | 6551 | 2 |
| C (container) | AMD Ryzen 9 9950X | OpenCL (Intel) | 23410 | 6234 | 1 |
| C (container) | AMD Radeon graphics (iGPU of 9950X) | HIP | 23410 | 5048 | |
| C (container) | AMD Radeon graphics (iGPU of 9950X) | OpenCL (AMD) | 23410 | 5043 | |
| C (container) | AMD Radeon graphics (iGPU of 9950X) | OpenCL (mesa rusticl) | 23410 | 4964 | |
| C (alpine host) | AMD Radeon graphics (iGPU of 9950X) | OpenCL (mesa rusticl) | 23410 | 4963 | |
| D | Intel(R) Core(TM) i7-11800H | Python | 73000 | 2730 | |
| B | AMD Ryzen 7 PRO 6850U | Python | 73000 | 2573 | |
| A | AMD Ryzen 5 1600 | OpenCL (Intel) | 23410 | 1551 | 3 |
| A | AMD Ryzen 5 1600 | Python | 73000 | 941 |
Note 1: Uh oh! The Python implementation runs almost 40% faster than the OpenCL implementation on the exact same hardware and software! This appears to be the fault of Intel’s OpenCL implementation really sucking on this CPU. PoCL is significantly faster than either.
Note 2: The same code on the same hardware runs around 30% faster in a container (LXC) than on the host! I believe that this has something to do with the only available OpenCL backend on the host being for the GPU. “But wait”, you might say, “isn’t this using the Python bridge, not OpenCL?” Well it turns out that the Python bridge does use OpenCL, as explained in the following section.
Note 3: If you compare these numbers to my older benchmarking numbers, you will notice that they are very different. Previously I ran an attack mode 0 dictionary+rule attack and reported the running rate, as opposed to the benchmark (-b) mode used for this table.
I would have really liked to have tested more OpenCL backends against each other, but unfortunately I encountered many cases where they would break in a variety of ways. On multiple systems I tried to test, I couldn’t even get a single CPU OpenCL backend running.
With the oddities out of the way, here are the takeaways from benchmarking:
- Dedicated GPUs are much, much faster than consumer CPUs for hash cracking, even top-of-the-line models.
- Integrated GPUs are a mixed bag.
- OpenCL implementations ran from 16% to 370% faster than Python implementations on the same CPU and software.
So is it worth implementing OpenCL modules now that we have Python modules? If you have access to any dedicated GPU hardware, even ones that are 10 years old, then yes, you should expect a minimum of an order of magnitude speed increase. With modern hardware, you should see an increase of more than 500 times. If, for whatever reason, you only have access to CPUs, then the answer is that it depends on the CPU and how much you value implementation time vs cracking time, but probably not.
Deeper Dive Bonus Section
This section goes into hashcat internals and shows how to expend significantly more effort for basically no gain. If a Python script plugin doesn’t fulfill all your requirements or you’re curious about implementation details, read on. Otherwise:
Python Bridge Details
So far we’ve used the Python bridge without really knowing what’s going on under the hood. In this bonus section, I’ll tell you how to implement a custom module that uses the Python bridge (and why you’d want to do that), but before we can do that, we have to understand a little bit of hashcat’s architecture. Big caveats: I’m going to stay surface level, only cover the simplest cases, and ignore critical components like salt handling. I’m sure a hashcat core developer would say “well that’s not really how it works” about every other sentence.
In hashcat v6 (pre-bridges), the cracking flow looked like this:
- hashcat initializes, selects appropriate hash mode plugin2
- the plugin’s C module parses the hash line into digest, salts, etc.
- hashcat sends parsed data to target device (GPU, CPU, etc.)
- the plugin’s kernels execute. The simplest case involves:
- init loads data into appropriate data structures
- loop executes the hash
- comp compares the computed digest to the input digest
- if a digest matches, hashcat pulls the password candidate and computed digest off the target device
- the plugin’s C module reformats the digest into the printable hash line
- hashcat prints the result, writes it to a potfile, etc.
Hashcat v7 introduced bridges, which make the connections between the different components customizable. Hash mode 73000, the default implementation of the Python bridge, now looks like:
- hashcat initializes the plugin for hash mode 73000. The Python bridge starts a Python process.
- the plugin’s C module parses the hash line into digest, salts, etc.
- hashcat sends parsed data to device (GPU, CPU, etc.)
- the plugin’s kernels execute:
- init loads data into appropriate data structures
- Python bridge copies the data back to the host and into the Python process
- Python executes the specified script to compute the hash
- Python bridge copies the computed digest from the Python process to device
- comp compares the computed digest to the input digest
- if a digest matches, hashcat pulls the password candidate and computed digest off the device
- the plugin’s C module reformats the digest into the printable hash line
- hashcat prints the result, writes it to a potfile, etc.
A fair amount of this flow was a surprise to me when I figured it out:
- It still runs OpenCL kernels as part of the pipeline.
- Parsing still happens in a C module.
- Data is copied back and forth between the OpenCL device and the Python process on the host multiple times for every password candidate.
Taking a step back, we have implementations of a fully custom plugin in previous posts and a pure Python implementation previously in this post. There is a hybrid option as well: a custom plugin that uses the Python bridge. You don’t have to use hash mode 73000 to use the Python bridge.
Looking at the numbered flow list for hash mode 73000, items 2, 4.1, 4.5, and 6 make up the plugin, while item 4.3 is the Python code. In the next section we’ll reimplement the plugin so that it understands the same hash lines as the fully custom plugin but computes them in Python.
Why would you want to implement a custom plugin that uses the Python bridge? Most of the time, you wouldn’t, but there are some good reasons to:
- To be able to parse arbitrary hash lines instead of the simple format required by hash mode 73000.
- As an intermediate step in writing a full plugin, so that the parsing code can be tested separately from the OpenCL computation code. This would have saved me significant headache.
- To use the Python bridge in other ways. For example a plugin can split its computation between OpenCL for the first loop kernel and a Python script for loop2. This is also a stepping stone to using a non-Python bridge like hash mode 70100 does.
- For an educational exercise to understand the architecture better, which was my true reason.
Writing a Custom Module for the Python Bridge
We want to copy the standard plugin that uses the Python bridge, hash mode 73000, and modify it. In this case we can take the preexisting Bitwarden-server-specific code from plugin 23410 and the Python script and merge it into our new plugin, 23411 (the number doesn’t matter, so I just incremented it by one). Sounds simple enough, right?
Except there is one architectural change that we need to make to the 23410 plugin. The digest size for our hash is 32 bytes, but the Python bridge hard-codes a 16-byte digest buffer. The standard Python bridge plugin, 73000, handles this by computing an MD4 digest of the raw digest and passing that around instead. The risk of a false positive is the risk of an MD4 collision, which is to say a 1/2^128 chance. Wrapping the digest with MD4 necessitates modifications to both the C module and the OpenCL kernels.
And because the salt handling is completely different, we’ll need to rewrite the Python calculation functions from before.
So even though this is “just” combining a few pieces of code that are already written, it turned out to require a fair amount of effort.
The code can be found on GitHub. I find that looking at a 3-way diff for each file between 7300, 23410, and 23411 to be the most helpful. GitHub doesn’t offer a way to do that on the web as far as I know, but you can download the code and use, for example, vim -d src/modules/module_73000.c src/modules/module_23411.c src/modules/module_23410.c, vim -d OpenCL/m73000-pure.cl OpenCL/m23411-pure.cl OpenCL/m23410-pure.cl, and vim -d Python/73000-bws.py Python/23411.py.
Since most of the code is reused, I won’t go through all of it, except I will cover the data structures, because they can be tricky.
Data Structures
There are two main data structures that (slow) hashcat plugins use: a tmp struct and a salt struct.
The tmp struct is generally used for local storage on a computation device (GPU) of IO data and as a temporary buffer for computation intermediaries. The Python bridge hardcodes a particular struct, so there’s no need to make any decisions about it:
typedef struct
{
// input
u32 pw_buf[64];
u32 pw_len;
// output
u32 out_buf[32][64];
u32 out_len[32];
u32 out_cnt;
} generic_io_tmp_t;
The init kernel loads the password candidates from the global pws structure into the pw_buf and pw_len fields of the tmp struct in order for the Python bridge to pass those along to the Python script. The reverse happens in the comp kernel when it evaluates the computed digests in the out_buf and out_len fields that are loaded from the Python bridge as the result of the Python script.
The salt struct, however, requires customization. The default Python plugin, 73000, uses the default salt structure, salt_t. We would like to store more than a single salt, so we define a custom esalt structure. We could theoretically stuff everything into the salt_t structure, but that would be messy.
typedef struct bitwarden_server_double_salt
{
u32 inner_salt_buf[64];
int inner_salt_len;
u32 outer_salt_buf[64];
int outer_salt_len;
u32 hash_buf[11];
} bitwarden_server_double_salt_t;
The main purpose of the salt struct is to pass the salts around. However if you compare this esalt structure to the struct in 23410, you’ll notice an extra field, hash_buf[11]. This is because of the architectural change I mentioned, because the Python bridge hard-codes a small digest buffer size. So we need to pass the MD4 digest of the target digest in the normal digest_buf, and we also need to store the full digest in order to print it once it is cracked. The reason it’s 11 u32s is because we’re keeping it base64-encoded (32 bytes raw × 4/3 = 43 max bytes base64, rounding to the nearest 4 (sizeof u32) is 44, or 11 u32s).
In the Python script, we have to modify the extract_esalts function to match the salt struct:
# Python/generic_hash_mp.py
for hash_buf, hash_len, salt_buf, salt_len in struct.iter_unpack("1024s I 1024s I", esalts_buf):
# Python/23411.py
for inner_salt_buf, inner_salt_len, outer_salt_buf, outer_salt_len, hash_buf in struct.iter_unpack("256s I 256s I 44s", esalts_buf):
Conclusion
Writing a pure-Python hashcat plugin is pretty easy once you get used to a few quirks. Compared to a standalone Python script, the performance is much better than I expected, and being able to use standard hashcat password candidate generation is very nice.
Writing a custom hashcat plugin that uses the Python bridge is not as easy. It probably only makes sense to do as part of a larger plugin development project, but if you’re already planning that, using the Python bridge during development might be a huge aid. It is not worth writing a custom parsing plugin just so that your Python script can handle custom hash lines; just write another script to convert them.
Overall, bridges are an extremely cool addition to the already amazing hashcat ecosystem. In the future I plan on doing more with bridges, especially trying to go beyond the Python bridge.
All of the code from this post can be found on GitHub.
-
You don’t have to make a new file. You can just edit
Python/generic_hash_mp.py, and the hashcat dev guide even soft-recommends doing that, but I wish they hadn’t. It’s way easier in the long-run to just put each implementation in a new file. ↩ -
Before bridges, “hash mode” and “plugin” were mostly interchangeable. “Hash mode” was used more when talking about the algorithm and “plugin” was used more when talking about the implementation. With bridges, it’s more confusing. The phrase “Python plugin” should be considered an entirely different thing than other uses of “plugin”. “Python plugin” really refers to the Python script that implements the hash, instead of the traditional sense of “plugin” that refers to the C module and OpenCL kernels. All Python plugins use the same hash modes (72000/73000) and the same plugin (in the traditional sense) backend. The plugin that we’ll develop in the following section is a plugin that uses Python and the Python bridge, but it isn’t a “Python plugin” in that sense. ↩