Chapter 2 Dynamic Programming
2.1 Introduction
Dynamic programming makes computationally demanding problems manageable by dividing them into a set of subproblems. On its surface, this might sound like a divide-and-conquer approach, but it is in fact different. D&C solves disjoint subproblems recursively. Each subproblem must be calculated from scratch, and their results are combined to reach a final solution. This results in more work than necessary. DP approaches, on the other hand, solve overlapping subproblems and save their results for later use; hence, each subproblem needs to be calculated just once, and overlapping subproblems can inform each other to lessen the computational burden.
DP alogorithms are designed as follows (from Introduction to Algorithms, 3rd ed.):
- Characterize the structure of an optimal solution
- Recusively define the value of an optimal solution.
- Compute the value of an optimal solution, typically in a bottom-up fashion.
- Construct an optimal solution from computed information.
2.2 Rod cutting
Say we have a silver rod, and a piece of length i would net us \(p_i\) dollars, such that we have the following price table:
set.seed(12)
N <- 25
C <- 50
l <- seq_len(N)
p <- sort(sample(seq_len(C),N,replace=TRUE),decreasing=FALSE)
matrix(p,nrow=1,dimnames=list('price',l))
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## price 2 4 5 5 8 8 13 16 18 18 24 26 26 27 27 28 34 38 39 42 43 44 46 46 48
We’d like to maximize our profit, so we need to figure out a way to cut our rod such that our pieces total to the largest value. We can do this recursively:
cut_recursive <- function(prices,length){
if (length == 0) return(0)
q <- -Inf
for (i in seq_len(length)){
#cat(sprintf('len=%s,val=%s\n',length-i,cut_recursive(prices,length-i)))
q <- max(q,prices[i] + cut_recursive(prices,length-i))
}
return(q)
}
cut_recursive(p,5)
## [1] 10
cut_recursive(p,15)
## [1] 32
cut_recursive(p,20)
## [1] 42
cut_aux <- function(prices,length){
if (R[length + 1] >= 0) return(R[length + 1])
if (length == 0){
q <- 0
}else{
q <- -Inf
for (i in seq_len(length)){
cat(sprintf('len=%s,val=%s\n',length-i,cut_aux(prices,length-i)))
q <- max(q,prices[i] + cut_aux(prices,length-i))
}
}
R[length+1] <<- q
return(q)
}
cut_memoized <- function(prices,length){
R <<- rep(-Inf,length + 1)
return(cut_aux(prices,length))
}
cut_memoized(p,5)
## len=0,val=0
## len=1,val=2
## len=0,val=0
## len=2,val=4
## len=1,val=2
## len=0,val=0
## len=3,val=6
## len=2,val=4
## len=1,val=2
## len=0,val=0
## len=4,val=8
## len=3,val=6
## len=2,val=4
## len=1,val=2
## len=0,val=0
## [1] 10
cut_bottomup <- function(prices,length){
r <- c(0,rep(-Inf,length))
for (i in seq_len(length)){
q <- -Inf
for (j in seq_len(i))
q <- max(q,prices[j] + r[i-j+1])
r[i+1] <- q
}
return(r[length+1])
}
cut_bottomup(p,10)
## [1] 20
cut_bottomup(p,15)
## [1] 32
cut_bottomup(p,20)
## [1] 42
2.3 Fibonacci rabbits
Let’s try and write a script that can solve a potentially computationally burdonsome problem – a problem involving reproducing rabbits. Say we start with 1 baby rabbit (age 0) at month 1. When the rabbit reaches 1 month of age, it can reproduce, producing a new baby rabbit the following month (month 3). On month 4, the baby rabbit can now reproduce, but our original rabbit will also reproduce, and so on. The rules are therefore
- Rabbits age each month
- Baby (age 0) rabbits cannot reproduce
And here is a diagram showing the process:
Now focus on the number of rabbits for each month: 1, 1, 2, 3, 5, 7. It’s the fibonacci sequence, where the current months total is a sum of the previous month’s total. We can therefore make a function that can recursively calculate the number of rabbits, using the finonacci sequence, only requiring the number of months the process will span across.
A quick aside: recursive algorithms are hard. They take some work to get a hang of them. I would not worry about either trying to write recursive algorithms or completely understanding how the code below works. The point of showing them is that there’s often a natural way to tackle a programming problem, but it’s not necessarily always the best way.
fib <- function(n){
if (n==1 || n==2){
return(1)
}else{
return(fib(n-1) + fib(n-2))
}
}
fib(5)
## [1] 5
for (n in seq_len(25)) cat(fib(n),' ')
## 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025
Let’s change the problem a little bit. Let’s assume now that the rabbits can die after k months. For \(k=3\), we’d have the following process:
We can write another recurssive algorithm to tackle this:
lifespan_inner <- function(n,y,Y){
if (n==1){
return(1)
}else if (y==Y){
return(lifespan_inner(n-1,y-1,Y))
}else if(y==1){
lifespan_inner(n-1,Y,Y)
}else{
return(lifespan_inner(n-1,y-1,Y) + lifespan_inner(n-1,Y,Y))
}
}
lifespan <- function(n,y){
return(lifespan_inner(n,y,y))
}
for (n in seq_len(25)) cat(lifespan(n,3),' ')
## 1 1 2 2 3 4 5 7 9 12 16 21 28 37 49 65 86 114 151 200 265 351 465 616 816
But look how much time it takes if we ramp up the number of months to 60:
t1 <- Sys.time()
for (n in seq_len(60)) cat(lifespan(n,3),' ')
t2 <- Sys.time()
cat('Time elapsed:',round(t2-t1,1),'minutes.')
## Time elapsed: 5.3 minutes.
It’s slow!
Now we’ll use a dynamic programming approach. You’re going to need dynamic programming later in the course for genomic sequence alignment, so it’s worth exploring the type of speedup one can obtain with a quite intuitive method, particularly when aimed at computationally demanding tasks often seen in genomics.
Again, dynamic programming saves a ton of time and resources by sweeping through a problem as a set of smaller subproblems, while storing the results of each subproblem as you go. Think about the rabbit flow chart. If we wanted to know the number of rabbits present on month 1000, we’d have to add months 999 and 998 together, which require information from months 996 through 998, and so on. A recursive algorithm would calculate the result of 999 independently of 998, and then add them together. Dynamic programming, on the other hand, would have the results from those previously months stored, simply requiring us to look them up.
The game here involves the following:
- Make a \(n \times y\) matrix M, where n is the number of months and y is the rabbit’s lifespan.
- Row 1 will represent month 1, column 1 will represent baby rabbits, and column y will represent the final month of life for an adult rabbit.
- Each subsequent row will be a running tally of the number of rabbits in each age group. Because each month is updated sequentially, you only need the information of a previous row (month) to update a current row (month).
dynprog <- function(m,y,p=FALSE){
mat <- matrix(0,m,y)
mat[1,1] <- 1
for (i in 2:m){
y1 <- mat[i-1,]
y2 <- mat[i,]
y2[1] <- sum(y1[-1])
y2[-1] <- y1[-y]
mat[i-1,] <- y1
mat[i,] <- y2
if (p){
cat(sprintf('y%s:\t%s',i,paste0(mat[i,],collapse='\t')))
line <- readline('')
}
}
return(rowSums(mat))
}
Here are some example answers to check. Note that there may be some variability given the max integer number in R, which is set in the options. If you get the right answer for smaller parameterizations, then your code is correct.
dynprog(25,5,TRUE)
## y2: 0 1 0 0 0
## y3: 1 0 1 0 0
## y4: 1 1 0 1 0
## y5: 2 1 1 0 1
## y6: 3 2 1 1 0
## y7: 4 3 2 1 1
## y8: 7 4 3 2 1
## y9: 10 7 4 3 2
## y10: 16 10 7 4 3
## y11: 24 16 10 7 4
## y12: 37 24 16 10 7
## y13: 57 37 24 16 10
## y14: 87 57 37 24 16
## y15: 134 87 57 37 24
## y16: 205 134 87 57 37
## y17: 315 205 134 87 57
## y18: 483 315 205 134 87
## y19: 741 483 315 205 134
## y20: 1137 741 483 315 205
## y21: 1744 1137 741 483 315
## y22: 2676 1744 1137 741 483
## y23: 4105 2676 1744 1137 741
## y24: 6298 4105 2676 1744 1137
## y25: 9662 6298 4105 2676 1744
## [1] 1 1 2 3 5 7 11 17 26 40 61 94
## [13] 144 221 339 520 798 1224 1878 2881 4420 6781 10403 15960
## [25] 24485
dynprog(50,5)[50]
## [1] 1085554510
dynprog(70,6)[70]
## [1] 2.685139e+13