Professional Documents
Culture Documents
Project Overview
comparative analysis on four existing pre-computed table matrices for reversing crypto- graphic functions in realworld password-cracking applications
Algorithm was implemented in CUDA-C on an NVIDIA Kepler Based GPU Architecture (Tesla)
Involved use of effective load balancing measures, multithreading techniques and algorithmic enhancements.
Hashing
Maps a string of any length(message) to an output of fixed length(digest)
MD5 Algorithm
MD5 takes as input a message of arbitrary length and produces as output a 128-bit fingerprint or message digest of the input.
MD5 involves the following steps: Padding Append length Initialize MD5 Buffer Process in 16 word blocks Output
TMTO strikes a compromise between the time complexity of an exhaustive key search and the memory complexity of a dictionary attack. Successful cryptanalytic attacks such as Hellman Tables use a TMTO based approach.
CUDA
Extension to C A kernel executes in parallel across a set of parallel threads. The programmer or compiler organizes these threads in thread blocks and grids of thread blocks. The GPU instantiates a kernel program on a grid of parallel thread blocks. Each thread has a thread ID and each thread block has a block ID within its grid.
Hellman Tables
Implementation in CUDA
In our cryptanalysis experiment using Hellman tables, we have m = 500, 000, 000 and t = 10, 000 to cover a total input space of 5 billion passwords. Each f in the encryption chain mentioned earlier was a combination of MD5 Hashing, XOR encryption and character selection Chain generation operation was then parallelized across tables and each chain was made an independently computed unit. Use optimal number of threads ,blocks and grids and by using appropriate indexing techniques. Sample function call:
precomputeOnDevice<<<gridDim,threadsPerBlock>>>(startin
Start from SP4 and forward compute the chain till the next value is X
Hellman Attack
Step 3: Compare f(f(X)) with EPs
EP0 EP0
EP2
EP2
EP4
Implementation
Highly-parallelized to fully optimize the multi-threaded capability of the 3072 CUDA cores in each GTX 690 card. Physical bandwidth limit of 1 Gigabyte (GB) data transfer link between the GPU and the systems main memory Bottleneck.
Pre-computed values are loaded into the CPU and loaded segment by segment into the GPU.
Endpoint values in Hellman tables are sorted in ascending order across all the tables to facilitate the use of binary search.
EP
EP0
EP0 EP2 EP2 EP4
Collisio n
EP2
Chain Merger
False Positive
Experiment
A list of 10,000 passwords was selected which were then reduced to 7 characters in length. These passwords were reverse engineered from the leaked list of unsalted SHA1 hashes as part of the 2012 LinkedIn attack. The results of this program were then recorded in the format of the time needed to process a password hash, the average number of collisions/false-positives, the average time taken to find a password and so on.
Results : Accuracy
Possible Improvements
Reducing collisions Convenient data structures such as a red black tree or a dictionary supporting fast indexing operations store hitherto computed chains within a table so that when a collision is detected redundant computational effort can be spared Distinguished Check Points Speeds up chain regeneration and also reduces time taken to resolve the false positives.
THANK YOU
Q&A