Inner Workings of CryptaML

Here is how everything works

Prompt Engineering: Shot Prompting

In the process of training this model, shot prompting was used in order to generate the best possible response from GPT-4. There was a prompt "dictionary" which had a multitide of 0 shot (no example code), 1 shot (1 example code), 2 shot (2 example codes) and 3 shot (3 example codes) prompts. With the 3 shot prompt generating the highest level of validity in GPT's response, 3 shot prompts were used for the whole research.

It's all in the CWE's

The Common Weakness Enumeration is a category system for hardware and software weaknesses and vulnerabilities.

What is a CWE?
How they helped CryptaML

The list of CWE's compiled as shown below were used as a checkpoint in CryptaML's algorithm to make sure all possible, known scenarios and vulnerabilites were accounted for.

Shell Script Backing

This is a 6 step script which helped CryptaML achieve accurate and quick results.

Step 1 is a syntax error check which evaluates the file that was inputted into CryptaML for any errors in the syntax to avoid any false positive vulnerabilites later on in the steps.

Step 2 runs the inputted code through a large comprehensive list of CWE's in order to match the input to a variety of known vulnerabilites found in scenarios listed in the CWE's.

Step 3 generates a YAML.file that represents the vulnerabilites found in Step 2 that mirrors the way Semgrep would generate their YAML.files

Step 4 goes through and runs the output through Semgrep's algorithm in order to double verify the validity.

Step 5 is the step that generates a new version of the inputted code from Step 1, that is fully debugged and rectified for the User.

Step 6 projects a message of completion.

Semgrep vs. CryptaML