Tutorials‎ > ‎

Week 3: Prime numbers and the Sieve of Eratosthenes


An Introduction to Prime Numbers


What are they?

A number (greater than 1) is a prime number if has no positive divisors (factors) except for 1 and itself.  For example, 3 is prime because its divisors are 1 and 3 only.  But 4 is not a prime, because it's divisors are 1, 2 and 4.
    • The first few primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, etc.
    • 2 is the only even prime number (because all other even numbers are divisible by 2)
    • There are infinitely many prime numbers!
    • Composite numbers -- numbers which are not prime themselves -- can be split into a product of primes in just one way (if we don't worry about the order of multiplication).  For example:
      • 4 = 2 x 2
      • 6 = 2 x 3
      • 8 = 2 x 2 x 2
      • 9 = 3 x 3
      • 10 = 2 x 5
      • 12 = 2 x 2 x 3
      • 14 = 2 x 7
      • 15 = 3 x 5
      • etc.

Why are they important?

Prime numbers can be thought of as the "building blocks" of the whole numbers. Every number is built out of prime factors, just as every molecule is built out of atoms. The "Fundamental Theorem of Arithmetic", established by Euclid, says that:

Every integer greater than 1 is either prime itself or is the product of prime numbers. Although the order of the primes in the product is arbitrary, the primes themselves are not.

We have already used this theorem: In the first mini-challenge, we saw that the total number of divisors is related to the (unique) prime factorization in a simple way.

Why are they mysterious?

Prime numbers are rather mysterious, and they have intrigued mathematicians through the ages. Here are some aspects of their mystique:

  • Many people have tried and failed to find a simple formula for generating all of the prime numbers. Most mathematicians do not expect such a formula to exist. But they are still intrigued by formulae which can generate some of the primes, for example:

    1) In 1772 Euler noticed that the quadratic polynomial P(n) = n2 - n + 41 is prime for all positive integers n less than 41 ... but P(41) is equal to 412, so P(41) is not prime.

    2) The Fermat numbers,

    are prime for n = 0, 1, 2, 3, and 4. Pierre de Fermat thought that perhaps they may be prime for all n. But, sadly, for n=5 the Fermat number is divisible by 641 (so it's not prime). In fact, it is thought (though not proved) that Fermat numbers are composite for any n greater than 4. 
  • On the large scale, we know roughly how the primes are distributed. Mathematicians have introduced a prime-counting function, written \scriptstyle\pi(x), whose value is equal to the number of primes less than or equal to x. It is known that the number of primes less than x is roughly equal to x / log(x), when x is large. But on the small scale, the primes seem to be scattered somewhat "randomly". Efforts to understood their distribution have  created unexpected links with other areas of mathematics (see e.g. the Riemann hypothesis).
  • There are some simple conjectures about primes which remain unproven. For example:
    • the Goldbach conjecture suggests that every even integer greater than 2 can be expressed as the sum of two prime numbers.
    • Mersenne primes are prime numbers which can be written in the form 2n - 1. But are there an infinite number of Mersenne primes? No-one knows for sure!
  • It seems that there is no "easy" way to determine whether or not a given number is prime. The larger the number, the longer it takes to test its primality. In fact, security in the digital age rests on the difficulty of factorizing some very large numbers.

Computing the Prime Numbers


Suppose we want to compute all the prime numbers less than N. How can we do it? Well, a simple but effective method was first introduced by Eratosthenes, in the third century BC.

The Sieve of Eratosthenes

First, write out the numbers from 1 to N. You can write them in a line, or in a square like this (for the numbers 1 to 100):

The number square
First, strike out ("cross out") 1, because 1 is not a prime. Then, strike out all the even numbers except for 2 itself ... because all even numbers greater than 2 divide by two, so cannot be prime. Next, find the smallest number which has not been struck out, i.e., 3.  Now strike out every multiple of 3 (except 3 itself), i.e. every third number. Moving on, we find that 4 has been eliminated, but 5 has not. So we strike out all multiples of 5 (except 5 itself), and move on again, repeating in this fashion. Continue until you have eliminated all the composite numbers.

The numbers that remain are the primes! 

You may have a grid that looks something like this:

The Sieve of Erathosthenes on the Number Square
This grid shows that 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 are prime.

You may have noticed that all composite numbers in the grid above (the crossed-out numbers) are divisible by 2, 3, 5 or 7. These are the primes which are less than the square root of 100, i.e. 10. In other words, it was only necessary to strike out the powers of 2, 3, 5 and 7 to complete the task for N=100. Can you see why this is? (Hint: The smallest composite number that is not divisible by 2, 3, 5 or 7 is 11 x 11 = 121).

Writing an algorithm

Erathosthenes' sieve works by repeating a set of simple steps: it is one of the very first examples of an algorithm.  Computers are excellent at implementing algorithms, as they work extremely fast and do not seem to get bored! Let's consider how we could make the Sieve of Erathosthenes work in a computer code. First, we could write the steps in the algorithm out in words:
  1. Set the size of the sieve, n
  2. Make a list of all the numbers, 0 up to n.  (We will use the numbers 0 to n, rather than 2 to n, so that the list index is equal to the number, for convenience).
  3. Strike out the multiples of 2 (except 2 itself)
  4. Find the next smallest number which is not struck out (i.e. p = 3)
  5. Strike out the multiples of p (except p itself)
  6. Repeat from step 4 until p is larger than the square root of n.
  7. Build a list of primes from the remaining numbers.
Now, we can try coding these steps in Python.

Let's start with steps 1 to 3. We want to strike out the multiples of 2 by setting them to zero. But how can we keep track of whether a number has been struck out or not?  One way is to use a list (a, say) with n+1 elements, representing the status of numbers 0 to n. We will strike out a number (i, say) by setting a[i] equal to 0. Let's try writing some lines of code to do this:

n = 10000            # Set the size of the sieve
a = [1]*(n+1)        # list of length n+1, every element equal to 1 (not crossed out)
p = 2                # 2 is the first prime
j = 2*p              # This is the first multiple of p
while j <= n:        # a loop which continues until we exceed the size of the sieve
    a[j] = 0         # "cross out" the multiple by setting a[j] equal to zero
    j += p           # move on to the next multiple of p
print a[:100]        # check the output

Do you understand the steps here? Here are some observations:
  • [1]*(n+1) constructs a new list from n+1 copies of the short list [1]
  • We're using a different kind of loop, using "while" rather than "for". "while" repeats the indented commands for as long as the given condition remains true (in this case, while j is less than or equal to n).  "While" loops are very useful, but beware: it is very easy to create an infinite loop! For example, here, if j never exceeds n (for example, if we forget to increase j) then the loop will run forever ... or at least, until we terminate the program in frustration! For more about the "while" loop, see here.
  • j += p   This means, add p to j, and assign the new value to j.  It is equivalent to writing j = j + p.
  • a[:100]  The list "a" is too large to print to the screen. But we can examine the first part of it, by taking a slice of the list. Lists can be sliced in various ways. This command returns the first 100 elements in the list.  To get the last 100 elements, try a[-100:]
Try entering these lines in a Python module, save the file, and run it (for instructions, see Tutorial 1). If it doesn't work, you'll see some kind of error message -- can you fix the code? (First, check that the indenting has been copied correctly).  If it works, then you should see a list of 1s and 0s, with all the multiples of two set equal to 0.

The next task is to implement step 4: moving on through the list to find the next prime number. This could also be achieved with another "while" loop:

p += 1     # Step forward once
while a[p] == 0:
    p += 1   # Keep stepping forward until we reach the end of the loop

These lines are a little bit dangerous: what happens if p increases until it exceeds n? Then we will fall off the end of the list, and the program will "throw an error". This error will crash your program (unless it is "caught")!  To stop this happening, we we need to think a bit more carefully about how to terminate the program once it's job is done. In other words, how to stop once p exceeds the square root of n.

To make the whole algorithm work, we need to iterate the process of crossing out prime multiples, and moving forward to get the next prime p. But we need to stop when the prime counter p exceeds the square root of n. Can you work out how to do this?

Try writing a program to implement Eratosthenes' Sieve algorithm in full. If you can do this, congratulations -- you are well on your way to becoming a mathematical programmer! If not, don't worry, take a look at the first Python file (sieve1.py) attached to this page. Can you see how it works? Try it!

Extending your program

Once you have a working algorithm for finding all the prime numbers less than N, using the Sieve of Eratosthenes, you can use it in many ways. If you are feeling confident, you may like to skip to the Exercises below.  On the other hand, you may be more interested in improving your program in other ways. For example, you could:

1.  Improve the program so that it asks the user to enter the size of the sieve.  You could use the following line:

str = raw_input("What is the size of the sieve?")
n = int(str)

This is OK, as long as the user enters an whole number greater than 2. But if they enter "what do you mean?", the program will throw an error, and crash! So, it would be a good idea to check the user's input, to make sure it is valid for the program. If not, you can give the user some gentle instructions.

2.  Test the efficiency of the sieve algorithm. In other words, how long does it take to run?  Of course, this depends on the size of the sieve, and the speed/memory of your computer. I would expect the time taken by the algorithm to increase (very roughly) in proportion to n.

3.  Use the list of primes to look for factors of, e.g., the Mersenne numbers 2n - 1, and the Fermat numbers.

If you are feeling confident, try making enhancements 1, 2 and 3 for yourself. If you are not sure how to get started, take the second code at the foot of the page, sieve2.py,. This gives some example Python code. Take a look, to see if you can follow the logic. (If it doesn't work, let me know: pi3challenge@sheffield.ac.uk).

Graphics and animations

The Sieve of Eratosthenes' has been studied through the ages -- so we are in good company! On the Wikipedia page, you will find a rather nice animation showing how the sieve filters out composite numbers, in a step-by-step manner.

Next week we will write a Python program to illustrate how we may apply the sieve to the first 10000 numbers. It will look something like this:

Illustrating the Sieve of Eratosthenes with a Python program

Exercises

Using a list of primes that you found with the Sieve of Eratosthenes, can you answer the following questions:
  1. What is the 2013th prime number?
  2. Which of these are prime numbers? (a) 997267 (b) 997269 (c) 997271 (d) 997273.
  3. How many primes are there under 10000? Under 100000? Under 1000000? Check your answers in the table given here.  
  4. What are the prime factors of the fifth Fermat number, 2^(25) + 1 = 232 + 1 = 4294967297 ?
  5. Find pairs of consecutive primes which differ by (a) 20, (b) 30, (c) 40.
  6. What is the maximum gap between consecutive primes under 1 million?
  7. Find a pair of primes whose sum is 123456
  8. Find the first 9 primes which are also in the Fibonacci sequence
ċ
sieve1.py
(1k)
Sam Dolan,
11 Apr 2013, 12:40
ċ
sieve2.py
(2k)
Sam Dolan,
11 Apr 2013, 15:07