Homework 13 (Project 4)

Paul Krzyzanowski

November 8, 2020

Deadline: Sunday, November 22 11:59pm EST

Recitation notes.
Recitation video
Sample output

Introduction

Proof-of-work was created as a mechanism to demonstrate that the creator of the proof-of-work had to put in a substantial amount of computing effort while making it efficient for a recipient to validate this.

In Bitcoin, proof of work serves two functions:

  1. It makes creating a block with a valid block hash so difficult that it is extremely unlikely that two bitcoin nodes would propose a block at the same time.

  2. It takes so much computation to compute the proof of work to create a valid hash for one block that the amount of computing power an attacker would need to modify a sequence of hash pointers for many blocks to change an old transaction is not feasible.

This assignment contains two parts:

  1. In part 1, you will implement a program that computes a proof of work for a file. You will specify a difficulty level on the command line and the program prints a set of headers as output.

  2. In part 2, you will implement a program that validates the headers produced in part 1. If the proof of work is incorrect or the hash is incorrect then the validation will fail.

Environment

This is an individual project. All work should be your own except for the referenced code for the hash function.

You may use Go, Python, Java, C, or C++ in your implementation.

You should be able to implement this on any platform but you are responsible to make sure that the program runs on Rutgers iLab systems with no extra software.

Background

Hashcash was an idea to reduce the amount of email spam. It would do this by requiring the sender to solve a difficult puzzle before sending a message.

The sender would then provide proof that this puzzle was solved and messages would be rejected if such a proof was missing or in valid. The proof was provided as a header, called a stamp_, in the mail message.

Someone using hashcash might spend several seconds creating the stamp. That’s not disruptive for someone who is sending a few messages but essentially impossible for a spammer who needs to generate stamps for millions of messages. Because the puzzle uses the content of the mail message and the recipient’s address, the spammer would need to spend many years of computing time to send a million messages.

The solution to the puzzle needs to be verified efficiently by receivers, so we need a puzzle that’s difficult to solve but easy to verify.

The idea of hashcash was adopted by Bitcoin and many other cryptocurrencies to enable participating systems provide evidence of Proof of Work when adding a new block to the bitcoin blockchain.

The puzzle

Hashcash had to create a puzzle that was a function of the message, was sufficiently difficult to compute, but was efficient to verify. This aligns closely with properties of one-way functions. Cryptographic hashes are one-way functions that provide fixed-length output.

For example, here’s a SHA–256 hash of the text ‘The grass is green’:

f3ccca8f3852f5e2932d75db5675d59de30d9fa10530dd9855bd4a6cd0661d8e

It takes only a few milliseconds to compute this. The inverse function, finding the text that would produce this has, requires a brute-force search. You would need to try hashing a huge amount of different messages to find one the produces the same hash.

For example, Google attacked the SHA–1 hash but only by trying over 9×10^18 hashes. The attack required 12,000,000 GPU years of computing power. A stronger hash function, such as SHA–2, would take far longer. Clearly, reversing a hash is far too difficult a problem.

An easier puzzle

We can make the puzzle easier. Suppose we each for some text W that we concatenate with the message M. When W||M is hashed, the resulting hash value will have a certain property.

For example, this was our SHA–256 hash of ‘The grass is green’: f3ccca8f3852f5e2932d75db5675d59de30d9fa10530dd9855bd4a6cd0661d8e

If we look at the first bytes in binary, we see the bits are 1111 0011 1100 … . Suppose we want to modify the message in a way that its SHA–256 hash will be a value whose first 6 bits are 0.

There is no way to predict how to create a message that will create such a hash. The only thing we can do is try different combinations, prefixing the message with different values of W.

This particular challenge turns out not to relatively easy and we can solve it in a few milliseconds. For example, if we prefix the letter ‘f’ to the text, we get this hash with the hex value that starts with 01891:

0189108649ff4cd02c8af4e099c8d719ec54eff327df14a89305932926f4bd93

The binary value starts with 7 zeros:

0000 0001 1000 1001 ...

Adaptive complexity

We can change the difficulty of the puzzle by altering the number of leading 0 bits that we need to find. This difficulty will be an average. Sometimes we might get lucky and find a prefix that produces the desired result quickly. Other times it will take longer.

Here’s a table of some of the prefixes I came up with for the text ‘The grass is green’.

Difficulty, d Prefix, W Iterations Time (s)
9 JQ 1,891 0.002491
17 d$3 20,271 0.02586
23 et*2 1,108,192 1.4
27 3O941 28,415,235 36.59
28 VaKH9 248,316,223 323.5
30 )FT5D 351,377,855 453.1
31 8(i6N2 4,490,406,584 5063.6
32 tJ2IRB 22,016,518,319 12,270

The difficulty is the number of leading zero bits. In this example, I found a string prefix that results in a hash with 23 leading 0 bits by testing a little over a million variations of prefixes in 1.4 seconds. However, finding a prefix that results in a hash value with 27 leading 0 bits took testing 28 million prefixes and took over 36 seconds.

The puzzle gets difficult quickly. Finding text that would produce a hash with 30 zero bits required testing over 350 million different prefixes and took 453 seconds – a bit over 7 minutes. To get 31 zeros, I had to test over 4 billion prefixes and that took 1.4 hours on my old iMac. Producing a hash with 32 leading zeros required testing over 22 billion prefixes and over 3 hours of time.

Size-invariant difficulty

The time to compute a hash is longer for large messages; weened to iterate over more bytes. To make the difficulty not be a function of the message length, we can modify the puzzle to one where we add prefixes to a hash of the message and then hash the resulting message to see what we get.

The average difficulty of the problem now is simply the value d, which defines the number of leading 0 bits in the hash result. The actual time it takes to find the right prefix W will depend on: - Luck: we might get lucky and test some strings early on that will provide the hash result we want. We may get lucky occasionally but not most of the time. The law of averages will play out. - The speed of your computer and the quality of your code. We can speed up hashes greatly by using GPUs or custom processors as people do with bitcoin mining, but we won’t do that in this assignment. - Mostly, the difficulty depends on the value of d, the number of leading zero bits we want in the resulting hash.

Proof of Work

The prefix that was found — the string W — that, when concatenated with the message, produces the desired hash output, is called the Proof of Work.

IT is a short string that we can provide to prove that we did the work to find this value that produces the desired hash.

The proof of work is very efficient to test. For instance, it took my program over four billion hashes to find a prefix for the text The grass is green that will result in a hash where the first 31 bits are all 0.

This prefix is the proof of work. It’s the string 8(i6N2. You can verify the proof of work with one hash and look at the resulting value. It will take you only a millisecond.

Your assignment: part 1

Your assignment is to write a program called pow-create that creates a proof of work string for a given file:

pow-create nbits file

The work is the search for a string that, when prefixed to a hash of the given file (file), will result in a SHA–256 hash that contains a certain number (nbits) of leading 0s.

For example, suppose we have the following text in a file called walrus.txt:

The time has come, the Walrus said,
To talk of many things:
Of shoes — and ships — and sealing-wax —
Of cabbages — and kings —
And why the sea is boiling hot —
And whether pigs have wings.

We can use the openssl command on a macOS or Linux system to find its SHA256 hash:

$ openssl sha256 < walrus.txt
66efa274991ef4ab1ed1b89c06c2c8270bb73ffdc28a9002a334ec3023039945

The hash starts with 66ef. The hex digit 6 is 0101 in binary, so we have one leading zero bit in this message.

To create a proof of work string with a difficulty of 20 (at least 20 leading zero bits), we run the command:

$ pow-create 20 walrus.txt
File: walrus.txt
Initial-hash: 66efa274991ef4ab1ed1b8...28a9002a334ec3023039945
Proof-of-work: hl04
Hash: 000002b2311ce58427ab7c1bfd0cb1...3d948c1c603a524dc11fb28
Leading-bits: 22
Iterations: 1496419
Compute-time: 1.75376

Your pow-create command will:

  1. Create a SHA–256 hash of the data.
  2. Convert it to a printable string (matching the string produced by the openssl command).
  3. Pick a string that’s a potential proof of work value.
  4. Create a hash of the string in (3) concatenated with the string representation of the hash in (2).
  5. If the hash doesn’t start with at least nbits zero bits (the number supplied by the first argument of the command line) then go back to step (3) and try a different prefix.

For this example, it took almost 1.5 million hashes with different variations of prefixes to find a hash that starts with 20 zero bits.

The string that we came up with is hl04. This is presented as the proof of work.

Output

Your program will print results to the standard output (stdout) in a standard header format as used by mail headers or HTTP. This is one name-value item per line with each line containing the header name, a colon, one or more spaces, and the value.

The headers are:

File:
The name of the file.
Initial-hash:
The SHA–256 hash of the file.
Proof-of-work:
The printable string that is your proof of work.
Hash:
The SHA–256 hash of the proof of work concatenated with the original hash string.
Leading-bits:
The number of leading 0 bits in the hash you computed. It should be greater than or equal to the number requested.
Iterations
The number of different proof-of-work prefixes you had to try before you found one that works.
Compute time:
How long this process took, in seconds (including decimal seconds if appropriate).

Testing your proof of work

You can easily test your program’s result against the data produced by the openssl command

Run your pow-create command:

$ ./pow-create 20 walrus.txt
Initial-hash: 66efa274991ef4ab1ed1b8...28a9002a334ec3023039945
Proof-of-work: hl04
Hash: 000002b2311ce58427ab7c1bfd0cb1...3d948c1c603a524dc11fb28
Leading-bits: 22
Compute-time: 1.75376

The above output is trimmed a bit. The only header we care about here is the Proof-of-work value, hl04.

We can check our hash with the one openssl produces by running the openssl command with the sha256 argument. This value should match the Initial-hash header:

$ openssl sha256 <walrus.txt
66efa274991ef4ab1ed1b89c06c2c8270bb73ffdc28a9002a334ec3023039945

Then we prefix the proof of work to that hash string and find the sha–256 hash of this proof of work string concatenated with the hex string output of the original hash:

$ echo -n 'hl0466efa27499...9002a334ec3023039945'|openssl sha256
000002b2311ce58427ab7c1bfd0cb1679906b24343d948c1c603a524dc11fb28

We can see that the resulting hash starts with five zeros followed by a 2. We have a hash that has 22 leading zero bits, which is at least as good as the 20 we wanted, so the proof of work is valid.

Hints

This is a super-short assignment. You won’t write a SHA–256 hash function. In python, I’m guessing you can use hashlib. In other languages, you may need to find source code that computes a hash.

If you’re using source code for a SHA–256 function, do NOT submit multiple files for an entire crypto library package. The code for computing a SHA–256 hash is small. Submit only the source file you need for that function and cite your where you got the source in your comments.

If you are using a hash function that needs to be compiled, your Makefile needs to have instructions on how to build all the code.

Make sure your hash function works on the iLab systems.

Test that your hash function produces the same results as openssl. You need this for testing prefixes. As with writing cryptography functions, the recipient needs to be able to verify your work with their own code.

I’m not telling you how to find prefixes. There are various ways you can do this and you should figure this out. Keep your prefixes as printable 7-bit ASCII text (e.g., no extended characters such as ü, ñ, or é) so they can be presented in the headers. This means no whitespace characters. To make testing easier, do not use quote characters as part of the string either.

Watch your workload

As shown in the table above, finding long prefixes of zeros can take an extremely long time – it gets exponentially more difficult. That is why bitcoin miners use data centers filled with custom ASIC processors that try trillions of hashes per second.

You might want to set a threshold on the number of prefixes you try so your program won’t run for an unbounded time. For example, you might set it to a billion or so, especially on a Rutgers system. For your PC, you can set it to a few tens of billions.

Test your code with small difficulty levels. If you can generate proof of work strings ranging to a difficulty of, say, 20 bits, then your code will likely work fine for longer amounts of leading zeros.

You don’t want to cause a strain on computers if you’re using shared Rutgers systems … or get your account locked because you used up your CPU quota.

Making it real

If you were going to use this type of proof of work system for real, you might make a few changes. You would not bother turning the hash into a text string but rather hash the 256-bit binary hash. Bitcoin hashes the block header, which includes the proof of work value, the Merkle tree hash for transactions in the block, and a few other fields.

We’re using the text representation of the hash only to make testing and printing more convenient. It does not diminish the complexity of the work and, in fact, makes the hashing somewhat longer since we are hashing strings that are 64 bytes (512 bits) plus a short but variable-size proof-of-work prefix string.

The header for the assignment prints data that is not necessary. You will not care about the hash values since anyone can recompute them. The only important fields are the proof of work value and the number of leading zero bits that that the result promises to have (and even this latter value can be made optional since you can compute the hash and reject shorter results).

Depending on the application, you would likely use longer difficulty values that would make it not feasible for an attacker to recompute. If this is used for an application such as email (e.g., hashcash), a value that can be computed within a few seconds is fine to discourage spammers). If this is used for a blockchain to support distributed logging, for instance, you’d use a much longer value. Bitcoin, for instance, typically requires around 74 leading zeros in its hash.

Your assignment: part 2

The second part of the assignment is a verifier. It is called pow-check:

pow-check powheader file

The command is provided a file the headers (generated by the pow-create command and a file with the original message. It validates the headers against the file.

These are the tests it performs:

  • It checks the value of the Initial-hash in the header. This is the SHA–256 hash of the message.

  • It computes the hash of the Proof-of-work string in the header concatenated with that initial hash string. This value should match the Hash header

  • Finally, it checks that the number in the Leading-bits header exactly matches the number of leading 0 bits in that hash header.

The result of pow-check will be a pass or a fail. If passed, simply print a pass message. If any tests failed, specify which of these tests failed.

Sample output

You can easily do your own tests using openssl but I’ve supplied you with sample files that contain header output for various input files. You should produce headers similar to these and your pow-check program should successfully validate all these headers.

What to submit

Place your source code into a single zip file. If code needs to be compiled, please include a Makefile that will create the necessary executables.

We don’t want to figure out how to run your program.

We expect to:

  1. unzip your submission

  2. run make if there’s a Makefile

  3. Set the mode of the programs to executable: chmod u+x pow-create pow-check

  4. Run the commands as:

    ./pow-create #bits samplefile >sampleheader
    ./pow-check sampleheader samplefile

As with the previous assignment, if you are using python, you can submit either:

A. pow-create and pow-check scripts that run the program or

B. (preferably) programs named pow-create and pow-check that that starts like:

#!/usr/bin/python3   
print('Hello, world!') ...

If you are using java, you will have a makefile that compiles the class files and the pow-create and pow-check programs will be scripts that runs the appropriate java command with the arguments. The file pow-create will contain something like this:

#!/bin/bash
CLASSPATH=. java PowCreate "$@"

Test your scripts on an iLab machine to make sure they work prior to submitting.

Also test that your programs gracefully handle any invalid input thrown at them. They should not crash under any circumstaces.

Last modified December 8, 2020.
recycled pixels