Solving a Weighted Set Covering Problem for Improving Algorithms for Cutting Stock Problems with Setup Costs by Solution Merging

. Many practical applications of the cutting stock problem (CSP) have additional costs for setting up machine conﬁgurations. In this paper we describe a post-processing method which can improve solutions in general, but works especially well if additional setup costs are considered. We formalize a general cutting stock problem and a solution merging problem which can be used as a post-processing step. To solve the solution merging problem we propose an integer linear programming (ILP) model, a greedy approach, a PILOT method and a beam search. We apply the approaches to diﬀerent real-world problems and compare their results. They show that in up to 50% of the instances the post-processing could improve the previous best solution.


Introduction
One of the most natural tasks in many production processes in various economic areas is processing raw materials to obtain various kinds of goods. Often, such a task involves cutting out parts from the raw material in predefined sizes. Deciding how to cut the raw materials into parts such that the unusable raw material waste gets minimized is clearly an economic incentive. This task was and is in simple cases solved by humans, but especially when the number and the different types of needed parts is large it can be hard to manually find satisfactory solutions. Therefore, many companies start to use computer based solutions for calculating good cutting patterns. This lightens their employees workloads and often the algorithms find better cutting patterns than a human would ever be able to and therefore save costs.
In the literature, such problems are called cutting stock problems. They deal with cutting small items out of larger raw materials, given some specified demand for each item size. Some example industries which heavily need to solve this kind of problems are paper, film, metal, glass, and wood industries. The area of cutting stock problems is widely studied in the literature. The classical problems but also many highly specialized problem variants are considered and sophisticated solution approaches for solving them got developed.
Many of those solution approaches generate many cutting patterns during their execution. For example so called improvement heuristics try to improve a given solution by exchanging some cutting patterns by other cutting patterns. This leads naturally to a pool of patterns which may be much larger than just the patterns used in the final solution. The main idea of this thesis is to exploit this pool of potentially promising cutting patterns and find a good solution using just those cutting patterns in a postprocessing step. This idea follows the very general concept of solution merging, which uses multiple solutions for a problem and merges them together to one superior solution. For more details on the concept of solution merging see [BPRR11]. The advantage of this approach for our problem is that selecting the best combination of patterns from the pattern pool does not need to know the specific restrictions on the patterns which depend on the specific 1. Introduction problem variant we are considering. Therefore, the postprocessing is independent of the problem variant and the pattern restrictions, except for restrictions which relate to combinations of patterns. This implies that solution approaches for the postprocessing can be applied to a big variety of problem variants in the area of cutting stock problems.
In this thesis we will propose a general cutting stock model which covers many cutting stock problem formulations and variants. Then we will formulate a set covering problem which tries to find the best combination of patterns from a given pattern pool. Furthermore, we will propose four different approaches for solving the set covering problem and one hybrid approach which uses a set covering approach in combination with a construction approach for solving the underlying cutting stock problem.
This thesis was developed as part of a project which was financed by LodeStar Technology Ges.m.b. H. In this project an approach for solving a two-dimensional variant of the cutting stock problem was developed by Dusberger and Raidl [DR14,DR15,DR17]. The implementation of this approach was integrated in the software package by LodeStar Technology and is used by many clients from different industries, especially the wood cutting industry, to optimize their cutting patterns. A machine used for executing such cutting patterns in the wood cutting industry is shown in Figure 1 this tight relationship with the industry we have a set of real world instances for the given two-dimensional variant of a cutting stock problem and can use the algorithm by Dusberger and Raidl to generate pattern sets which can be used as instances for the set covering problem. 1. 1

. Related Work
One of the weaknesses of the algorithm by Dusberger and Raidl is the rather weak ability to consider so called pattern setup costs. They occur in practice if a human has to adjust the cutting machine whenever it should cut a structurally different pattern. Because we want to minimize the human workload, we assign costs for changing the pattern which effectively assigns costs to each structurally different pattern. Another situation which can be modeled with pattern setup costs is when raw materials can be stacked and cut simultaneously. In this case we assign costs to the amount of time a machine needs for a cutting plan. Minimizing the time leads to a more efficient production process. Therefore, we want to have patterns with a high multiplicity so that we can stack the raw materials when cutting those patterns. In the end also leads to assigning costs to each structurally different pattern, although there may be a maximum stack size. In this case those costs have to get multiplied by the number of stacks needed for one pattern.
As already mentioned the approach by Dusberger and Raidl does not work well together with pattern setup costs. This was the main reason for us to solve this set covering problem in a postprocessing step which can then improve the solution quality by focusing on large stacks of good patterns. For testing our approaches we use the pattern sets generated by the algorithm of Dusberger and Raidl applied to different real world instances. As we will see in the test results the postprocessing is able to improve the solution for over 50% of the instances if pattern setup costs dominate the other costs. Also, with lower pattern setup costs or with no pattern setup costs the postprocessing is able to find improvements for many instances.
We also propose integrating our hybrid approach as a neighborhood into the algorithm by Dusberger and Raidl such that it is not just used as a postprocessing but regularly in the iterations of the improvement algorithm. This helps to guide the search towards solutions with good pattern setup costs and can improve the performance of the algorithm especially for instances with a strong focus on pattern setup costs.

Related Work
There is much literature in the area of cutting stock problems. Due to the high number of problem variants Dyckhoff [Dyc90] developed a typology to categorize cutting stock problems and Wäscher et al. [WHS07] extended it later on. We will look at this typology in Chapter 2 to show to what categories of cutting stock problems our set cover approach is applicable. The work by Sweeney and Paternoster [SP92] tries to summarize the research done in the field of cutting and packing problems, although this work is already quite outdated. For a survey of two-dimensional cutting stock problems see [LMM02].

Introduction
There are already approaches in the literature which are based on the idea of reusing already generated patterns and combine them to potentially better solutions. Cui et al. [CZY15] developed a two phase approach for solving the one-dimensional cutting stock problem with pattern setup costs. The first phase generates good patterns and the second phase searches for the best solution composed out of those generated patterns which is done with an integer linear program (ILP) model. Their ILP model is similar to our model although they consider only the one dimensional case and also a slightly less general problem. For example, the problem has no availabilities of raw materials.
Another two-phase approach for the one-dimensional cutting stock problem with setup costs was proposed by Förster and Wäscher [FW00] and the second phase tries to combine patterns in a given solution into one pattern which is then used multiple times. The approach of reducing the number of different patterns in a given solution is based on the work of Diegel et al [DCVSN93].
There exist a lot of research results and many algorithms concerning the classical set covering problem [CTF00]. Our set covering problem is similar to the general weighted set covering problem, although there is no concept like setup costs for this problem. A theoretical analysis of the general weighted set covering problem was done by Yang and Leung [YL05].
The K-staged two-dimensional cutting stock problem, from which the problem we consider in our tests is derived, was already studied early by Gilmore and Gomory [GG65]. To solve this problem they introduced the general technique of column generation which is used today for many combinatorial problems in a large number of other application domains. For generating patterns for our considered variant with setup costs we use the algorithm proposed by Dusberger and Raidl [DR14,DR15,DR17].
We presented a part of the approaches and results discussed in this thesis at the sixteenth international conference on computer aided systems theory and published the work in the conference proceedings [KR18].

Structure of the Work
In the following chapter, we will explore the area of cutting stock problems and present some basic problem types. In Chapter 3 we introduce a general cutting stock problem formulation and the related cutting stock set cover problem which will be the main problem variants we consider. Chapter 4 presents four different solution approaches for solving the cutting stock set cover problem and one hybrid solution approach for solving the general cutting stock problem. The approach by Dusberger and Raidl is described in detail in Chapter 5. Chapter 6 shows the results of the approaches and evaluate the performance when adding a neighborhood based on our hybrid approach to the algorithm by Dusberger and Raidl. Finally, we conclude the thesis in Chapter 7 and present some ideas for future work.

Cutting Stock Problems
In many economic areas a manufacturer has to cut out objects of specific sizes from raw materials, as these objects are needed in the production process. An economic incentive is to use as few raw materials as possible or equivalently to have as little waste as possible remaining after cutting the objects out. Therefore, planning how to cut the raw materials is crucial and one can consider the optimization problem of finding a good cutting plan such that the amount of used raw materials is minimal. Such optimization problems are called cutting stock problems and because of their wide variety of applications there exist many variants of them and a lot of research is done in this area.
To classify the different types and varieties of cutting and packing problems Dyckhoff [Dyc90] formulated 1990 a typology for cutting and packing problems which was later extended by Wäscher et al [WHS07]. Note that both typologies also include packing problems which can frequently be modeled in the same way as cutting problems but have other applications. Cutting problems represent the problem of cutting some parts out of bigger parts and packing problems represent packing items into bigger items, but for modeling both problems are strongly related. All problems of this type have in common that there is given a set of large objects (input, supply) and a set of small items (output, demand) have to be grouped together into possibly multiple groups to fit onto/into the large objects. Dyckhoff typology is based on four characteristics of the problems: • The dimensionality of the objects and the items. One-and two-dimensional problems are the most common problems, although there are problem formulations for three dimensions or in general N dimensions.
• Kind of assignment: There are two possibilities. Either one has to fit all the given items onto a selection of the given objects or one has to fill all given objects with a selection of the given items.

Cutting Stock Problems
• Assortment of large objects: There are three possibilities. Either there is only one large object, multiple objects all having the same shape, or multiple objects with different shapes • Assortment of small items: -Few items with different shapes -Many items with many different shapes -Many items with few different shapes -All items have the same shape The extension of Wäscher et al. distinguishes five categories: • The dimensionality as in Dyckhoff's typology • Kind of assignment as in Dyckhoff's typology. We will call the first case when we have to use all items and a subset of the objects input minimization and the second case when all objects have to be used and a subset of items output maximization.
• Assortment of small items: identical: All items have the same shape weakly heterogeneous: Items can be grouped in a small number of classes of the same shape strongly heterogeneous: Only few items have same shapes • Assortment of large objects: -One large object * The size of the object is fixed * The size of the object in at least one dimension is variable: Occurs most of the time with input minimization where the size of the single object has to be minimized -Several large objects: Here we only consider fixed size objects in contrast to the case of one large object, where we allow a single object of variable size * identical * weakly heterogeneous * strongly heterogeneous • Shape of small objects: regular small items: rectangles, circles, boxes, cylinders, balls, etc.
irregular small items In this work we want to focus on problems dealing with input minimization as kind of assignment. In the case of input minimization using only identical small items is normally not considered since this would strongly reduce the problem and in many cases it would be trivial to solve. Therefore, we only consider the weakly heterogeneous and strongly heterogeneous type for the assortment of small items. Wäscher  As the chapter title suggests we will focus mainly on different types of cutting stock problems but since bin packing problems are closely related to cutting stock problems we will present first the basic bin packing problem.
Bin Packing Problem. Given a bin size V and a set of items I with sizes a i ∈ R + for each i ∈ I. Find a minimal number of bins B and a partitioning of I into B sets This simple formulation of the bin packing problem can be classified by the classification of Wäscher et al. as follows. The dimensionality is one (the size), the kind of assignment is input minimization as we already fixed this before, the assortment of small items is strongly heterogeneous (every item has possibly another size), the assortment of large objects is several identical large objects (the bins), and the shapes of the small objects are regular (one dimensional with a size).
In the following we will focus on cutting stock problems, i.e. where we fix the assignment type to input minimization and the assortment of small items to weakly heterogeneous. The one-dimensional case has among others applications in paper, film and metal industries. In those applications large rolls of some material get produced and then have to be cut into smaller rolls with different widths, depending on the demand. Figure 2.1 shows a roll slitting machine which can cut a paper roll into smaller paper rolls.
For the two-dimensional cutting stock problems, application areas are for example the wood and glass industry. In those applications often appearing problems are of the form given a number of rectangular shaped sheets of material the question is how to cut out smaller rectangular shaped pieces with minimal waste. One could also consider non-rectangular shaped sheets or pieces, although most of the existing literature on 2-dimensional cutting stock problems considers only rectangles.
In the following we present some important standard problems.

One-dimensional Cutting Stock Problems
In the context of one-dimensional cutting stock problems we will call the large objects the input rods and the small items the pieces. The most basic one-dimensional problem is stated as follows.
One-dimensional cutting stock problem (1DCSP). Let T be a set of different input rod types, L t ∈ R + the length of the input rod of type t ∈ T and c t ∈ R + the cost of the input rod of type t ∈ T . Furthermore, let E be a set of pieces and for each i ∈ E let l i ∈ R + be the length of the piece and d i ∈ N \ {0} the demand of the piece. A solution S to the problem is now a set of patterns P S , and for each pattern P ∈ P S an amount a S P . Furthermore each pattern P ∈ P S is associated with an input rod type t P ∈ T . Each pattern P ∈ P S can be represented by an element vector (e P i ) i∈E ∈ N |E| where each entry of the vector is the amount of pieces of the corresponding length which get cut out of the input rod, i.e., e P i is the amount of pieces of length l j used in the pattern. A solution is feasible if it covers all pieces, i.e., and if all patterns are valid, i.e., The problem is now to find a feasible solution S which minimizes the costs of the used input rods c(S) := P ∈P S a S P c t P .

Two-dimensional Cutting Stock Problems
Remark 2. 1.1. In the formulation of 1DCSP we implicitly assumed that the amount of available input rods of each length is unlimited.
This basic problem was one of the first problems which got introduced in the context of cutting stock problems. A classical solution approach for solving it is column generation introduced by Gilmore and Gomory [GG61]. The problem was one of the first applications of column generation and is still a classical example for the usage of column generation.
Many authors formulate classical one-dimensional cutting stock problems where all input rods have the same length. Since this is just the restriction k = 1 in our problem formulation, we use the more general formulation as our base problem. In the typology of Wäscher et al. [WHS07] the problem as we defined it is called one-dimensional multiple stock size cutting stock problem (1DMSSCSP) which can then be distinguished from the one-dimensional single stock size cutting stock problem (1DSSSCSP). As we will mostly operate on multiple stock sizes, we will not use this extended terminology.
There are a lot of extensions of 1DCSP many of them originating from different real world applications. In this work we focus on one special kind of extension called setup costs. Modern cutting machines often need significant time to switch between two different patterns, therefore it is encouraged to minimize the number of different patterns such that associated costs with these times get minimized. This leads to a second objective and in general therefore leads to a multi objective optimization problem. One variant which considers this situation is the one-dimensional cutting stock problem with pattern reduction (1DCSPPR) [FW00]. It only considers input rods of the same length L and searches for a solution with minimal number of used input rods and within all those solutions it searches for the solution with a minimal number of different patterns. Another approach is to use cost weights measured in a currency instead of using the number of different patterns as second level objective. This combined objective was considered among others by Mobasher and Ekici [ME13] but is not covered as well as the 1DCSPPR in the literature. Since there are different names for this problem in the literature, we will call it the one-dimensional cutting stock problem with setup costs (1DCSPSC).
We want to mention that there are many more variants of the One-Dimensional Cutting Stock Problem and what we presented in this section are just a few selected variants which are important for our approach later on.

Two-dimensional Cutting Stock Problems
For two-dimensional cutting stock problems we will call the large objects the sheets and the small items the pieces or elements. There are many categories of two-dimensional cutting stock problems, one of the most basic distinctions being regularity. Note that 2DICSP is sometimes also called nesting problem in the literature. It is easy to see that 2DICSP is a generalization of 2DRCSP, although in the literature 2DRCSP is considered far more often than 2DICSP. This is probably because the regular version is easier to solve and maybe also because many problems occurring in practice naturally contain only rectangular pieces. Note that one could even further generalize 2DICSP by allowing also holes in the polygons or even allow pieces of other forms which are not polygons. Allowing pieces would increase the complexity of the problem even more since items could now also be placed within the wholes of other items. Relaxing the condition that all items are polygons would lead to problems which are hard to even formulate and even harder to solve. Furthermore, one can argue that all geometrical forms can be approximated by polygons since we allow arbitrary number of sides. One of the few works which consider the 2DICSP is the work of Albano and Sapuppo [AS80] who presented heuristics to solve the problem.
Next we will further investigate some categories of two-dimensional regular cutting stock problems. A common restriction in this class of problems is that each side of each piece is aligned parallel to one of the sides of the containing sheet. We call the 2DRCSP together with this restriction the two-dimensional orthogonal cutting stock problem (2DOCSP). Although De Cani [DC78] showed that this restriction may lead to substantially worse solutions than non-orthogonal solutions for some problem instances, most literature is only concerned with the orthogonal case of the problem.
A further restriction of the 2DOCSP which is common are so called guillotine cuts. The main idea behind this restriction is that many cutting machines can only cut through the whole material. Therefore, a guillotine cut is a cut (a straight line) which goes from one end of the sheet to the opposite end of the sheet. Note that it is allowed to do guillotine cuts in multiple stages, e.g. first cutting the sheet vertically and then cutting the resulting components horizontally where each component can be cut individually.
An example pattern is shown in Figure 2.2. In the first stage the red horizontal lines are cut. Then, the remaining components get cut along the blue vertical lines and last but not least the green horizontal lines get cut. Note that all the cuts always go from one side of the component to the other side and are therefore guillotine cuts. An example of a pattern which cannot be cut by guillotine cuts only is illustrated in Figure 2. 3. This can easily be seen by the fact that except of the waste on the right side nothing can be cut with a guillotine cut without cutting through an output piece. We call the problem when 2.2. Two-dimensional Cutting Stock Problems restricted to only guillotine cuts the two-dimensional guillotine cutting stock problem (2DGCSP).  If we consider only guillotine cuts, one can also consider the number of cutting stages of the patterns. For example the pattern in Figure 2.2 has three cutting stages, the red lines are cut in the first stage, the blue in the second and the green in the third.
One can now restrict the number of cutting stages in the patterns to at most K stages for a K > 0. This problem is then called the K-staged two-dimensional cutting stock

Cutting Stock Problems
problem (K2DCSP) [GG65]. This restriction may be inclined because of a simpler solution representation or because of practical limitations.
Up until now all two-dimensional problems presented here only use one sheet type with a given width W and height H, but many applications use sheets of different sizes in practice. Therefore, we introduce the following extension of the K2DCSP. A similar extension could be formulated for all presented two-dimensional problems until now.
K-staged two-dimensional cutting stock problem with variable sheet size (K2DCSPV). Given a set of sheet types T with widths W t ∈ R + , heights H t ∈ R + , available quantities q t ∈ N ∪ {∞}, and costs c t ∈ R + for t ∈ T . Furthermore, let E be a set of different element types. Each element type i ∈ E has a width w i ∈ R + , a height h i ∈ R + , and a demand d i ∈ N \ {0}. A solution S to the problem is now a set of patterns P S and for each pattern P ∈ P S an amount a S P . Furthermore, each pattern P ∈ P S is associated with a sheet type t P . Each pattern describes how to cut elements out of the associated sheet type only using guillotine cuts. We can associate with each pattern P ∈ P S an element vector (e P i ) i∈E ∈ N |E| which describes how often the i-th element occurs in the pattern P . A solution is feasible if all element demands are satisfied, i.e.
and all available sheet quantities are not exceeded, i.e.
The problem is now to find a feasible solution which minimizes the costs P ∈P S a S P · c t P . (2.1) As in the one-dimensional case, we can again consider the variant of the K2DCSPV where we add pattern setup costs. As input, we get an instance of the K2DCSPV together with stacking/setup costs c S t for each sheet type t and a maximum stack size s max ∈ N ∪ {∞}. Then the new objective is the old objective (2.1) plus the pattern setup costs where in the case of s max = ∞ the expression a S P s max is defined as 1 if a S P > 0 and as 0 otherwise.
The K-staged two-dimensional cutting stock problem with variable sheet size and pattern setup costs (K2DCSPVSC) is then defined by replacing the objecitve (2.1) by (2.2).

The Cutting Stock Set Cover Problem
Most of the methods from the literature for solving a set cover problem, especially improvement heuristics, generate many patterns during the execution of the algorithm, more than the ones that are used in the final solution. The idea of the cutting stock set cover problem is now to make use of this variety of generated patterns and find the best solution consisting of some of those patterns. The advantages of this method are that the problem is not concerned with the concrete structure of the patterns, it only is interested in the number of items on a pattern. Therefore, all problem-specific constraints for the pattern construction do not need to be considered, except of those constraints which operate on multiple patterns or on how they can be combined.
Before we can formulate the cutting stock set cover problem we formulate a general variant of the cutting stock problem which we will use as the base problem.
General cutting stock problem (GCSP). Let E be a set of elements, (d i ) i∈E ∈ N |E| a demand vector and s max ∈ N ∪ {∞} the maximum stack size. Further, let T be a set of stock materials and q t ∈ N ∪ {∞} the maximal available quantities for each stock material t ∈ T .
A solution is represented by a multiset of patterns. A pattern is in general a structural collection of elements in E which satisfies some problem-specific constraints. The structure of the pattern is also problem-specific, but irrelevant for this approach. We can associate with each pattern P an element vector (e P i ) i∈E ∈ N |E| which describes how often the element i is contained in the pattern P . Furthermore, a pattern P is associated to a stock material t P ∈ T out of which it gets cut. A pattern P has associated problem specific production costs c P P ≥ 0 and stacking costs c S P ≥ 0. Note that the set of possible patterns, its production costs and stacking costs are not part of the problem instance, since depending on which problem the possible patterns are described by problem-specific 3. The Cutting Stock Set Cover Problem structural rules and constraints and the costs are described as formulas possibly depending on the structure of the pattern and the used stock materials and elements.
We define a solution S as a set of feasible patterns P S and an amounts vector (a S P ) P ∈P S ∈ N |P S | . The goal is to find an optimal solution S which satisfies the demand constraints the stock material availability constraints and minimizes the total costs We further consider the problem variant general cutting stock problem with exact demands (GCSPE) in which demands must exactly be satisfied, which means we replace condition (3.1) by If we compare the GCSP with the typology of Wäscher et al., see chapter 2, we see that many problem types fit into the framework of GCSP. The only restrictions are that the kind of assignment is input minimization and that the dimensions of the large objects (the stock material) are fixed. All problems presented in chapter 2 fit into the framework. Moreover, many problem variants and extensions deal with special constraints for the pattern construction, especially for problem formulations closer to real world situations. All those restrictions on the pattern structure can be considered in the GCSP since it makes no assumptions on the problem-specific constraints for the pattern structure.
Note that the classical cutting stock problems do not allow overproduction as it is allowed in GCSP in contrast to GCSPE. If we do not have any setup costs overproduction makes no sense since you could simply remove the overproduced elements without increasing the costs, this is why all classical problems do not consider overproduction. But, as soon as we introduce pattern setup costs overproduction makes sense, since it may be useful to overproduce some elements to be able to stack the patterns together. Removing the overproduced elements would lead to more stacks and therefore higher costs.
Since the costs are non-negative, we can assume that patterns in an optimal solution are not empty. In fact in most applications patterns always have positive costs, so an optimal solution never has empty patterns. This assumption implies that the maximal amount of stock materials used is limited by D := i∈E d i . Therefore instead of using s max = ∞ we can always equivalently use s max = D. In the same way we can use q t = D instead of q t = ∞. To simplify the algorithms presented in this thesis we will assume from now on that s max ∈ N and q t ∈ N for all t ∈ T .
Using the GCSP we can now formulate the cutting stock set cover problem.
Cutting stock set cover problem (CSSCP). Let E, (d i ) i∈E , s max , T and q t for t ∈ T be given as in the GCSP. Furthermore, let P be a given finite set of feasible patterns (e.g. collected from different heuristic solutions). The CSSCP asks for a solution S to the underlying GCSP consisting of patterns in P, i.e. P S ⊆ P which satisfies the conditions (3.1) and (3.2) and minimizes the costs c(S) as defined in (3.3).
If we replace condition (3.1) by (3.4), we call the problem cutting stock set cover problem with exact demands (CSSCPE).

Solution Approaches
In this chapter we present four different approaches for solving the CSSCP, which are an integer linear program for solving the problem exactly, a greedy approach for finding reasonable solutions fast, a PILOT-approach, and a beam search.

Integer Linear Programming Formulation
We start by modeling the CSSCP as integer linear program (ILP). Theoretically it can solve the problem exactly, but in practice the approach does not scale well to large instances. Therefore, if we use a time limit it may produce solutions with large optimality gaps. We use integer variables a P for the amount of each pattern P and integer variables s P for the number of stacks of the pattern P .
If we want to solve CSSCPE, we replace constraint (4.1) by

Solution Approaches
The constraints (4.1) or (4.4) ensure that the demands get satisfied and the inequalities (4.2) guarantee that the maximal amounts for each stock material get respected. Furthermore, the constraints (4.3) couple the s P variables with the a P variables by ensuring that there are enough stacks, so that the maximal stack size s max gets not exceeded.

Greedy Heuristic
Since the ILP model can only compute solutions for small instances in reasonable time, we need other approaches for larger instances. The idea of the following greedy construction heuristic is to rate each pattern depending on the current unsatisfied demands and pick the best pattern as the next one greedily.
We greedily add patterns to a partial solution until the demands are all satisfied. Because of the greedy nature and the restricted pattern possibilities of the CSSCP/CSSCPE, it is easier to design a greedy heuristic for the CSSCP than the CSSCPE. Therefore, we will present in the following a greedy construction heuristic for the CSSCP and will talk afterwards how we could modify it to solve the CSSCPE.
Before we can describe the algorithm, we formalize what a partial solution is. . Let E, (d i ) i∈E , s max , T and q t for t ∈ T be an instance of the GCSP. A partial solution S for this instance is a multiset of feasible patterns, described by P S together with an amounts vector (a S P ) P ∈P S which satisfies (3.2) but not necessarily (3.1). Such an S is a partial solution of the GCSPE if additionally holds.
If we are further given a set P of feasible patterns forming an instance of CSSCP, a partial solution for that instance is a partial solution of the underlying GCSP which only consists of patterns in P. Furthermore, for the CSSCPE a partial solution must additionally satisfy (4.5).
Note that the empty set is always a partial solution of all four problems. Therefore, the greedy heuristic starts with a valid partial solution S = ∅ of CSSCP, and adds patterns to it until the partial solution is a feasible solution, i.e. it satisfies (3.1).
We give now in each iteration based on the current partial solution S each pattern a rating and then add the pattern with the highest rating to S. To also consider stacking costs we allow to add a pattern with a given amount at once. Thus, we do not only pick a pattern but also an amount for this pattern greedily. Therefore, we need to define a rating for each pair (P, a) ∈ P × N consisting of a pattern and an amount.

Greedy Heuristic
For this rating we will use a problem-depending volume value v i ∈ R + which represents the volume of element i or in general how difficult it is to add the element i to some pattern. In most applications we can use a volume (length for one-dimensional, area for two-dimensional, volume for three-dimensional, . . . ), but in theory one could use any values here, even values which dynamically change during the execution of the algorithm.
Using this volume value v i we can define the rating of a pattern and an amount by the sum of all volume values of all elements on the pattern whose demand is not yet satisfied divided by the cost of the pattern and the pattern stack. Formally we define, based on a partial solution S, the rating r S (P, a) for the pair (P, a) ∈ P × N by where the remaining demand r S i of element i is defined by It might happen that pattern P is already used in S, i.e. a S P > 0. Therefore, we may be able to add some amount of patterns to S without creating a new stack. Thus, to calculate the amount of new stacks introduced, we use the difference of the stacks for pattern P used in the old solution with the stacks for pattern P used in the new solution after adding a times the pattern P .
In the case where c P P = 0 it may happen that the rating is not well-defined since the denominator may be 0. In this case we have no costs for adding that amount of patterns. Therefore, we can add a maximal amount of those patterns such that we still have no costs without comparing to any other ratings. From now on assume therefore that the denominator is always larger than zero.
We also have to consider the remaining amount of the stock material and therefore we are only interested in amounts a ∈ A S P := {n ∈ N : n ≤ R S t P } where the remaining amount of stock material R S t is defined by What we want now to find is a pair (P 0 , a 0 ) ∈ P ×N with a 0 ∈ A S P 0 whose rating If one of the stock materials have an infinite available quantity q t then A S P will be all natural numbers for all patterns using this material t. In this case there are infinitely 4. Solution Approaches many possible pairs (P, a) and therefore we need to only consider relevant ones to find the maximum. The following theorems establish properties of the rating function and help us to develop an efficient algorithm for finding an optimal amount a for a pattern P .
and q t for t ∈ T together with P be an instance of CSSCP and S a partial solution of this instance. Furthermore, let P ∈ P be a pattern with A S P = ∅ and r be the remaining amount of the pattern P in a not yet finished stack in S, i.e. 0 ≤ r < s max and r ≡ a S P mod s max . Then it holds Proof. We distinguish the cases r = 0 and r > 0. If r = 0 we have For r > 0 it holds and therefore we can calculate The previous theorem showed us that it is enough to search for amounts smaller or equal to s max − r to find one with a maximum rating. In the next steps we want to analyze the development of the sequence (r a ) a max a=1 where r a := r S (P, a) for a ∈ A S P and a max := min s max − r, R S t P .

Greedy Heuristic
, and q t for t ∈ T , P, S, P , and r be as in Theorem 4.2. 1. Furthermore, let 1 ≤ a < a max be fixed. Using the three element index sets With that we can calculate The fact that

Solution Approaches
completes the proof.
Theorem 4.2. 3. Let E, (d i ) i∈E , s max , T , and q t for t ∈ T , P, S, P , and r be as in Theorem 4.2. 1. Then there exists an amount 1 ≤ a 0 ≤ a max such that the sequence r 1 , . . . , r a 0 is monotonically increasing and the sequence r a 0 , . . . , r a max is monotonically decreasing. Furthermore, if r > 0 we can choose a 0 = 1.

Proof.
Let a 0 be the first element in [1, a max ) for which r a 0 > r a 0 +1 . If no such a 0 exists we can choose a 0 = a max and are done. We prove now that r a+1 ≤ r a holds for all a 0 ≤ a < a max by induction on a.
Induction basis a = a 0 : r a 0 +1 ≤ r a 0 is true by definition of a 0 .
Induction step a → a + 1: We consider the equivalent condition 0 ≤ S a+1 of Lemma 4.2.2.
Note that 1 for all i regardless of the value a. Furthermore, we have I a+1 . Therefore, we get The last inequality follows from the induction hypothesis with Lemma 4.2.2 and therefore we get all in all 0 ≤ S a+1 which is equivalent to r a+2 ≤ r a+1 .
Therefore, we finished proving the monotonicity properties, what remains to prove is that we can choose a 0 = 1 if r > 0 which is the case if the values r 1 , . . . , r a max are monotonically decreasing. Note that in the case of r > 0 we have 1 r=0 = 0 and therefore 1 for all elements i and amounts a < a max . This implies that all terms of the sum S a are non-negative for all a < a max and therefore 0 ≤ S a for all a < a max . Using now Lemma 4.2.2 gives us that r a+1 ≤ r a for all a < a max which is what we wanted to show. Theorem 4.2. 4. Let E, (d i ) i∈E , s max , T , and q t for t ∈ T , P, S, P , and r be as in Theorem 4.2. 1

. Then, there exists an amount a 0 in
which has a maximum rating, i.e. r S (P, a 0 ) ≥ r S (P, a) for all a ∈ A S P , and it is the largest of all amounts that have a maximum rating and are smaller or equal to a max . Proof. Let a 0 be the first element in [1, a max ) for which r a 0 > r a 0 +1 . If such an a 0 does not exist, then a 0 := a max satisfies all properties. Note that in this case a max must have a maximum rating because of Theorem 4.2.1.

Greedy Heuristic
In the other case if an a 0 < a max with r a 0 > r a 0 +1 exists we know b9 Theorem 4.2.3 together with Theorem 4.2.1 that a 0 must have a maximum rating and that it is the largest of all amounts with this property which are smaller or equal to a max .
The only thing remaining to show is that a 0 ∈ A S,0 P . If a 0 = 1, then we know by Lemma 4.2.2 that either I 1 1 or I 1 2 are not empty since r 2 < r 1 implies that 0 < S 1 is not empty, then there exists an element i with e P i > r S i which implies On the other hand if I 1 2 is not empty then there exists an element i with e P i ≤ r S i < 2e P i which implies The only remaining case is now if a 0 > 1. In this case we know that r a 0 −1 ≤ r a 0 and r a 0 +1 < r a 0 and therefore, we get that 2 is empty and that there may only be elements and we get Theorem 4.2. 5. Let E, (d i ) i∈E , s max , T , and q t for t ∈ T , P, S, P , and r, A S,0 P be as in Theorem 4.2. 4. Furthermore, let a 0 be the largest amount in A S P which has a maximum rating, i.e. r S (P, a 0 ) ≥ r S (P, a) for all a ∈ A S P . If r > 0 and c S P > 0 then it holds a 0 ∈ A S,0 P . If r = 0 and c S P > 0 then it holds that either a 0 ∈ A S,0 P or a 0 = k · s max for some k ∈ N \ {0} and j · s max has a maximum rating for all 1 ≤ j ≤ k.

Solution Approaches
Proof. We consider the proof of Theorem 4.2. 1. If r > 0 and c S P > 0 then r S (P, a 0 ) = r S (P, s max − r) can only be true if Therefore, a 0 ≤ a max which implies by Theorem 4.2.3 that a 0 ∈ A S,0 P . On the other hand if r = 0 and c S P > 0 then r S (P, a 0 ) = r S (P, s max ) can only be true if which implies that j · s max has a maximum rating for all 1 ≤ j ≤ k.
Using Theorems 4.2.1-4.2.5 we can verify that Algorithm 1 correctly computes for a given pattern P the largest amount a 0 ∈ A S P with a maximum rating. The running time of Algorithm 1 is in O(m 2 ) where m ≤ n is the amount of item types on the input pattern P , i.e. all i ∈ E with e P i > 0. Note that line 18 of Algorithm 1 can be implemented efficiently by iterating through the different relevant item types and check for which k they get saturated. Therefore, the bottlenecks of the algorithm are the lines 16 and 18 which both need O(m 2 ) time since computing r S (P, a) needs O(m) time and |A S,0 P | ≤ m. The whole greedy approach is described in Algorithm 2. Since

Algorithm 1 FindBestA(S, P )
INPUT: An instance of CSSCP, a partial solution S, and a pattern P OUTPUT: The largest a 0 ∈ A S P with a maximum rating 1: if c S P = 0 then 2: if a 0 = 0 then 4: a 0 ← 1 Already the first pattern overproduces some elements 5: end if 6: if a 0 > R S t P then 7: return a 0 10: else 11: Compute r as in Theorem 4.2. 1. Compute the set A S,0 P 16: if r = 0 and a 0 = s max then 18: Proof. If we can prove that the number of iterations of the while loop is in we are done since FindBestA is in O(n 2 ). Next we prove the following statement. In each iteration of the while loop, when we add pattern P best with amount a best then at least one of the following happens: 1. There exists an element i for which the remaining demand is non-zero and gets more than halved.

Solution Approaches
Algorithm 2 Greedy Construction Heuristic for P ∈ P do 5: if R S t P > 0 then 6: a ← FindBestA(S, P ) 7: if r S (P, a) > r best then 8: r best ← r S (P, a) 9: (P best , a best ) ← (P, a) 10 There exists a material type t ∈ T for which the remaining available amount is non-zero and gets more than halved. 3. A previously not completely filled stack in S gets completely filled.
There are six different cases for where the return value gets set the last time in Algorithm 1 which computes a best : 1. Line 2: In this case the remaining demand of item i for which the minimum is achieved is reduced by which is greater than r S i 2 by Lemma 4.2.6 and the fact that since we know that a 0 > 0 it must hold e P i < r S i . Therefore, the remaining demand of item i got more than halved.

Line 4:
In this case the remaining demand of item i for which the minimum in line 2 was achieved is set to 0 which is also more than halved. 4.2. Greedy Heuristic 3. Line 7: In this case the remaining available amount of type t P is set to 0 and therefore more than halved. 4. Line 13: A previously not completely filled stack got completely filled or all the remaining amount of material type t P is used. 5. Line 16: Either a 0 = a max , in which case the same happens as in the previous case, for some element type i, in which case the remaining demand of i gets more than halved, or a 0 = in which case the remaining demand of i gets set to 0. 6. Line 19: If (k + 1)s max > R S t P we know that the remaining available amount of type t P gets more than halved. On the other hand if r S (P, s max ) > r S (P, (k + 1)s max ) this implies that at least one element type i was not overproduced by adding the pattern ks max times but gets overproduced by adding it (k + 1)s max times. Therefore, we know r S i ≥ ks max e P i and r S i < (k + 1)s max e P i which implies that Therefore, the remaining demand of item type i gets more than halved.
A natural number n can only get more than halved no more than log 2 (n) + 1 many times before it hits 0. To see this let a k be the natural number after k steps of more than halving. Since a 0 = n, we get a k < 1 2 k n. If a k ≥ 1 this implies 1 < 1 2 k n which implies n > 2 k and therefore k < log 2 (n). Therefore, for k ≥ log 2 (n) the value of a k must be 0.
Using this fact we see that case 1 can only happen in at most n(log 2 (max i∈E d i ) + 1) iterations and case 2 can only happen in at most |T |(log 2 (max t∈T q t ) + 1) iterations.
Since case 3 can only happen in at most half of the iterations, we are done.
Note that the running time proven in Theorem 4.2.7 is only the worst case running time and in most cases the running time is much closer to |P |n 3 .
To further reduce the running time we can use the following fact: , and q t for t ∈ T together with P be an instance of CSSCP. Furthermore, let S j be the partial solution of iteration j when running Algorithm 2 on this instance.
Then it holds that To prove Theorem 4.2.8 we use the following Lemma.

Solution Approaches
It simply follows by the following basic equivalent transformation: Proof of Theorem 4.2.8. It is enough to show the statement for k = j + 1 since then everything else follows by the transitivity of the inequality. We fix j and k = j + 1. Since we always only add patterns to a partial solution in Algorithm 2, we know that a S k P ≥ a S j P for all P ∈ P. From that it directly follows LetP be the pattern added to S k in iteration j, i.e. S k consists of all patterns in S j plus some amount of the patterñ P .
We distinguish now the casesP = P andP = P . In the first case we know that a S j P = a S k P and therefore Therefore, the remaining case is whenP = P . Let a j be the amount of patternP which got added to S j in iteration j. That implies that a j has a maximum rating for S j . Furthermore, let a k be an amount with maximum rating for S k . Then we know since a j was maximal for S j that r S j (P, a j ) ≥ r S j (P, a j + a k ). From that we get

Greedy Heuristic
We can apply now Lemma 4.2.9 and get that r s j (P, a j ) ≥ r s k (P, a k ).
Theorem 4.2.8 basically tells us that we can use the calculated ratings from previous iterations as upper bounds for the current rating of a pattern. Therefore, we store the latest calculated rating r latest P of each pattern. Then when we iterate through all patterns in P in an iteration we start with the patterns with the highest rating r latest P . Whenever we calculate a new rating we replace r latest P for this pattern. If the currently best rating of the patterns already calculated in the current iteration is higher than r latest P for some pattern P we do not have to recalculate that pattern since it will always be worse than the currently best pattern. Therefore, in each iteration we only have to calculate some of the ratings for the best patterns and in many practical cases this reduces the running time drastically, although the worst-case running time complexity stays the same.

Greedy Approach with Exact Demands
In this section we will present how to modify the presented greedy approach to be able to solve CSSCPE, which uses exact demands.
The naive way to modify the greedy approach is to modify 1 such that the found amount a is always such that no items get overproduced. That means that we change a max to Although this would give us a correct algorithm for CSSCPE, in many practical cases and for many instances it would not be able to find a feasible solution. To improve solution qualities for exact demands we therefore formulate a new problem variant of the cutting stock set cover problem.
Cutting stock sub set cover problem for exact demands (CSSSCPE). Let E, (d i ) i∈E , s max , T and q t for t ∈ T be given as in the GCSP. Furthermore, let P be a given finite set of feasible patterns (e.g. collected from different heuristic solutions). The CSSSCPE asks for a solution to the underlying GCSPE consisting of patterns derived from patterns in P. By derived, we mean that for each pattern P 0 in the solution there exists a pattern P ∈ P such that e P 0 i ≤ e P i , i.e. P 0 can be derived from P by deleting some elements from the pattern.
We can modify now our greedy algorithm in a simple way to efficiently solve the CSSSCPE. The only thing we have to do is whenever we add a pattern in line 16 of Algorithm 2 we remove unneeded elements from the patterns. Note that we also need to change the calculation of the amount a since we only can increase the amount a until before the next element on the pattern gets overproduced. So we can use the same calculation but only allow values in A S,0 P which do not overproduce any elements with still open demands (the elements which have no open demand get removed from the pattern anyway and are therefore not relevant for the stack calculation).

Solution Approaches
Note that the solving approaches for CSSCP were completely independent of the problem specific constraints on patterns. But for the CSSSCPE we may now also consider some problem specific constraints to not be violated when removing elements from a pattern. Therefore, it may happen that removing some elements from the best pattern P will fail in the greedy approach. In that case we will search for the second best pattern P 2 according to the greedy criterion. If removing unneeded elements from P 2 also fails we search for the third best pattern and so on until either we found a pattern for which the removal works or we stop the greedy algorithm with an infeasible partial solution.

PILOT-Approach
The preferred iterative look ahead technique (PILOT) is a well studied method to improve the performance of a construction heuristic. The method was proposed by Duin and Voß [DV99]. Its idea is to use an embedded simpler heuristic to complete a partial solution and use the objective value from the completed solution to rate the partial solution.
In the classical PILOT approach in each iteration the partial solution obtained after adding every possible extension is completed by a given construction heuristic. Then the extension corresponding to the best obtained complete solution gets applied. In Algorithm 3 a generic classical PILOT approach is presented. Apply the extension e with the best rating r e to S, i.e. S ← S ∪ {e} 8: end while To apply the classical PILOT approach for our problems CSSCP and CSSSCPE we use as construction heuristic the greedy heuristic presented in Section 4.2. Furthermore, as extensions of a partial solution we consider for each pattern P ∈ P only the pair (P, a) with a = FindBestA(S, P ). That means we do not consider every pair (P, a) as an extension, but for each pattern only one pair with the largest amount a which has a maximum greedy rating for this pattern.
It is easy to see that the worst case running time of the classical PILOT is the maximum number of extensions which have to be applied until a solution is complete times the 4. 3. PILOT-Approach maximum number of extensions of a partial solution times the worst case running time of the construction heuristic. In our case this leads to a worst case running time of Although, as already mentioned in Section 4.2, the worst case running time is not reached in many practical applications the running time of the classical PILOT is often too slow for large scale real-world instances. Therefore, we also consider a parameterized version of PILOT which restricts the possible extensions in each iteration to the best β extensions by the greedy criterion. Note that this restriction holds only for the PILOT outer loop, not for the extension ratings within the greedy construction heuristic.
The modified version of the PILOT applied to CSSCP is described in Algorithm 4. The compute for all patterns P ∈ P the amount a P ← FindBestA(S, P )

4:
compute for all patterns P ∈ P the rating r P ← r S (P, a P ) 5: set P 0 to the β best patterns according to r P 6: for P ∈ P 0 do 7: copy S to S and add pattern P with amount a P to S 8: complete S with Algorithm 2 9: store objective of complete solution o P 10: end for 11: set P 0 to the pattern with the best corresponding objective o P

12:
add P 0 with amount a P 0 to S 13: end while worst-case running time of the restricted version is now which makes a big difference in practical applications, depending on β.
To solve CSSSCPE we can modify the PILOT approach in a similar way as we did for the greedy approach in Section 4.2. 1. First of all we replace Algorithm 2 by the greedy algorithm which solves CSSSCPE as described in Section 4.2. 1. Furthermore, on line 7 we need to remove overproduced elements from the pattern before we add it to S . If that fails, we skip the pattern P and add the next best pattern to P 0 , beginning with the k + 1-th pattern, if one more exists. We also need to remove overproduced elements before we add the pattern P 0 to S in line 12. If that fails, we consider for P 0 the next best pattern in P 0 according to the objective values o P .

Beam Search Approach
Beam Search is well-known extension of greedy construction heuristics with the goal to improve the exploration of the search space [Low76, MCF + 77]. Instead of only storing the best solution and following the best extensions it stores the best β solutions for a parameter β.

Algorithm 5 Beam Search
INPUT: An instance of a discrete optimization problem OUTPUT: A solution to the problem 1: Initialize the set of current partial solutions S ← ∅ 2: Add the empty partial solution S = ∅ to S 3: while S is not empty do 4: for S ∈ S do 5: for e possible extension of S do Compute a rating r(S ) for S ← S ∪ {e} Store the β best partial solutions S , according to r(S ), in S 10: If any S is complete store the best objective o best and its solution 11: Remove all partial solutions from S which cannot be better than o best 12: end while 13: return the best found complete solution In the beam search approach in each iteration it checks all possible extensions of all stored solutions, applies them and keeps the best β partial solutions in the memory. The whole beam search approach is presented in Algorithm 5 in a generic manner.
To apply Algorithm 5 to our problem we have to decide how we rate partial solutions. We propose to do that similarly as we rated extensions, volume of all satisfied demands divided through the costs. For a partial solution S we define the rating r(S ) by Furthermore, as extensions we use the same as we did for the PILOT approach. That means for each pattern P there is one extension consisting of adding the pattern with the largest amount that has a maximum rating a P = FindBestA(S, P ). Finally, to check if a partial solution can still be better than the best objective found until now, we just have to compare the costs, since the costs may only get worse to complete the partial solution.
Using the beam search as formulated above for our problem leads to unsatisfactory solution qualities compared to the time spent in the beam search. The problem is that 4.4. Beam Search Approach since we compute for each pattern different amounts a, and also since different patterns may be of completely different sizes and costs (depending on the used raw material), it happens that some partial solutions have already much more item demands satisfied than other partial solutions in the same stage. Since the rating of the patterns always decreases when more patterns get added, see Theorem 4.2.8, the solutions which satisfy more demands have in average lower ratings than the solutions which satisfy fewer demands (and also have fewer costs). This leads to a bias of the rating function for partial solutions which satisfy few demands but also have few costs, compared to solutions which satisfy more demands and have higher costs. This is intuitively also clear since the fewer demands are left, the harder it gets to find a good pattern which satisfies exactly those demands. Furthermore, also stacking gets more difficult the fewer demands are left.
But this bias is exactly what we do not want since the solutions which have high stacks, i.e. they get filled in a few iterations, since each iteration adds a high stack of patterns, are potentially the best solutions. Therefore, we propose a variant of the beam search approach which distinguishes partial solutions by levels. One can also think of this variant in the sense that normal beam search is a form of tree search where in each iteration we operate on the best k nodes on the same tree level. In a normal tree an edge connects a node from level n to level n + 1, but in our case we want to allow edges in the tree which have different lengths. That means an edge does not always connect a node from level n with a node from level n + 1 but in general a node from level n with a node from level n + k. We then say the length of the edge is k. where c(s) are the costs of the partial solution S and c unit > 0 is the cost granularity of the tree. That means what we do is to choose a cost granularity and then group partial solutions into buckets based on their costs. In our case we propose to use the smallest pattern cost c unit := c P P . In the edge case that there are patterns with no costs we could also consider the stacking costs divided through the maximum stack of those patterns. If there is a pattern which has no costs and no stacking costs, then we just use the costs of the first pattern which has some costs, but this does not occur in any practical situations.
Algorithm 6 describes the whole beam search approach with levels for the CSSCP. Note that since we have a minimization problem and solutions are grouped by their objective values it is enough to stop the beam search as soon as we found the first complete solution and finished going through the bucket of this solution, since the best solution we can find is of course in the first bucket, which contains a complete solution. If the solutions in a bucket are sorted by cost, we can even stop the algorithm as soon as the first complete solution is encountered. for P ∈ P do 10: compute best amount a P ← FindBestA(S, P ) for pattern P 11: create copy S of S and add P with amount a to S Using that we get that the worst case running time of the beam search algorithm is O(D · β · |P| · n 2 ).
For solving problem CSSSCPE we can modify the beam search algorithm in a similar way as we did for the Greedy and the PILOT approach. When we add a pattern to a solution copy in line 11, we remove overproduced items from the pattern. If the removal fails we skip the pattern P and continue with the next pattern in P.

Hybridizing With a Construction Heuristic
One of the main drawbacks of constructing a solution for problem CSSCP is that at the end when the partial solution is almost complete and only few demands are left it is 4.5. Hybridizing With a Construction Heuristic unlikely that there is a suitable pattern in P which suits the remaining demand. Therefore, the longer the construction algorithm runs the poorer the quality of the best pattern in P gets, as was also shown by Theorem 4.2.8. To counterfeit this drawback we can hybridize a construction heuristic for CSSCP with a construction heuristic for GCSP. Ideally at the beginning the construction heuristic for CSSCP finds good patterns which can be stacked frequently and then when the demand decreases at some point the construction heuristic for GCSP takes over and constructs new patterns which directly suit the remaining demand.
Clearly this approach does not solve the problem CSSCP since the construction heuristic for GCSP will construct patterns which are not in P. But the solution produced by this approach is still a valid solution for the GCSP. Therefore, the approach presented in this section is a construction approach to find a promising solution for the GCSP under the assumption that we already found a lot of good patterns P through other methods.
To formalize the approach we assume that we already are given the set P of patterns and additionally we are given a construction method constructPattern which constructs one new pattern according to the remaining demand. That means the method gets as input an instance of GCSP and a partial solution S of GCSP and returns a new pattern P which may not be in P anymore.
Using this construction method we can formalize a hybrid approach which is presented in Algorithm 7.
Note that the only difference to the greedy algorithm are the lines 3 and 4. In each iteration of the hybrid a new pattern gets generated with constructPattern and added to the pattern set P. Since this pattern is directly constructed for the remaining demands, it will be the best pattern in P if all other patterns do not suite the remaining demands well and at that point the generated patterns will be used instead of the given patterns at the beginning of the algorithm.
The worst-case running time of the hybrid approach depends on the worst case running time of constructPattern, if that is in O(g(I)) for some function g(I)) we get for the whole hybrid approach a worst case running time of O n log(max i∈E d i ) + 1 + |T | log(max t∈T q t ) + 1 · (g(I) + |P| · n 2 ) .
That implies that in the case that if constructPattern is in O(|P | · n 2 ) then the worst case running time of the hybrid approach is the same as for the greedy approach. For practical purposes having a fast construction method implies that the hybrid needs almost the same time as the greedy, but gives us better solution qualities.
To solve the problem with exact demands, i.e. GCSPE, we can use the same modifications as presented for the greedy in Section 4.2. 1. This even works if constructPattern does not consider exact demands, but of course if it also considers exact demands the patterns produced by it will be much more useful in the hybrid and therefore it will increase the solution quality of the hybrid. if r S (P, a) > r best then 10: r best ← r S (P, a) 11: (P best , a best ) ← (P, a)

Solving the K-staged Two-Dimensional Cutting Stock Problem with Variable Sheet Size
In this chapter we will present an approach for solving the K-staged two-dimensional cutting stock problem with variable sheet size, which was developed by Dusberger and Raidl [DR14, DR15, DR17]. We will use this approach to generate instances of CSSCP and CSSSCPE for which we then test our approaches. Furthermore, we present how to incorporate our hybrid construction heuristic, Algorithm 7, as a new neighborhood into the solving procedure of Dusberger and Raidl and test how this improves the algorithm.
As a reminder we repeat in the following the problem formulation of the K2DCSPV as presented in Chapter 2.
K-staged two-dimensional cutting stock problem with variable sheet size (K2DCSPV). Given a set of sheet types T with widths W t ∈ R + , heights H t ∈ R + , available quantities q t ∈ N ∪ {∞}, and costs c t ∈ R + for t ∈ T . Furthermore, let E be a set of different element types. Each element type i ∈ E has a width w i ∈ R + , a height h i ∈ R + , and a demand d i ∈ N \ {0}. A solution S to the problem is now a set of patterns P S and for each pattern P ∈ P S an amount a S P . Furthermore, each pattern P ∈ P S is associated with a sheet type t P . Each pattern describes how to cut output elements out of the associated sheet type only using guillotine cuts. We can associate with each pattern P ∈ P S an element vector (e P i ) i∈E ∈ N |E| which describes how often the i-th element occurs in the pattern P . A solution is feasible if all element demands are satisfied, i.e.

Solving the K-staged Two-Dimensional Cutting Stock Problem with Variable Sheet Size
and all available sheet quantities are not exceeded, i.e.
The problem is now to find a feasible solution which minimizes the costs The approach by Dusberger and Raidl is a variable neighborhood search (VNS) metaheuristic combined with very large neighborhood search (VLNS) techniques. Therefore, before we present the approach we will shortly introduce what VNS and VLNS is. Then we present the approach by Dusberger and Raidl.

Variable Neighborhood Search
Variable neighborhood search (VNS) is a common metaheuristic used for solving many problems in literature [HM99,HM03]. The approach is an improvement heuristic in contrast to the approach techniques we used in Chapter 2. That means that the approach needs an already feasible solution to start with and then tries to improve this solution iteratively. Its main idea is to use multiple neighborhood structures to escape local optima of a single neighborhood structure. By a neighborhood structure we formally mean a function which maps a solution of the search space to a set of neighbors of the solution, i.e. a subset of the search space. Often neighborhoods are represented by a move operator which describes how to change a solution to get to its neighbors. For this problem many neighborhood structures have been proposed in the literature. A simple move for the TSP would be moving one city visit to another place in the tour. This gives us a neighborhood structure, where the neighbors of one tour are all tours which only differ by one city which is placed at another position. Other moves are the so called k-exchanges, which remove k edges and add again k edges in such a way that the result is again a tour which is different from the original tour. For each such k we get a neighborhood structure which is called k-exchange neighborhood.
If we are given now neighborhood structures N 1 , . . . , N k we can iterate through the neighborhood structures and search through each of the neighborhoods of the current solution if there exists a neighbor with a better objective value. If we find such a neighbor we assign the neighbor as the current solution and restart the procedure, starting again with neighborhood structure N 1 . When we searched at some point through

Variable Neighborhood Search
all neighborhoods and could not find a better neighbor we know that the current solution is a local optimum according to all the neighborhood structures, and we return the solution. This procedure is called variable neighborhood descend (VND).
One can extend a VND-procedure to a VNS-procedure by using additionally so called shaking neighborhood structures N 1 , . . . , N . Those additional neighborhood structures are normally much larger than the neighborhood structures N 1 , . . . , N k used for the VND and are not used for searching through them. Instead, they are used to randomly select a neighbor and jump to this neighbor. This is done whenever the VND could not improve anymore and then the VND is applied again to the new solution. If there are multiple shaking neighborhood structures the procedure uses the next neighborhood structure for shaking whenever the shaking move plus the succeeding VND could not improve the current solution and whenever the solution could get improved it starts again with the first shaking neighborhood structure. Therefore, the shaking part is from the structure again a VND wrapped around the VND with the difference that moves are applied randomly instead of searching through the neighborhoods systematically. This whole procedure is also called general variable neighborhood search (GVNS).
A simplification of the GVNS is the so called reduced variable neighborhood search (RVNS), which only uses shaking neighborhoods without applying any VND after the shaking. So it only selects a random neighbor, if it is better applies it and restarts with the first neighborhood structure and if its not better it uses the next neighborhood structure. Algorithm 8 shows how a RVNS works in detail. The approach by Dusberger and Raidl which we will present later on in this chapter can be seen as a RVNS which uses very large neighborhoods as neighborhood struc-

Solving the K-staged Two-Dimensional Cutting Stock Problem with Variable Sheet Size
tures. Although, in this approach moves are not generated completely random but use construction heuristics.

Very Large Neighborhood Search
The idea of very large neighborhood search is to have a large neighborhood, for which it is not feasible to enumerate all neighbors, but for which exists an efficient method to find the best neighbor or at least heuristically find a good neighbor [AEOP02,PR10].
There are many techniques and approaches for searching through large neighborhoods efficiently. Often exact approaches using network flows, dynamic programming, or similar techniques are used. Another kind of large neighborhoods are variable-depth neighborhoods and again another kind are ruin-and-recreate based neighborhoods. Since the large neighborhoods used by the approach of Dusberger and Raidl are all ruin-andrecreate based, we will go further into detail of those kinds of large neighborhoods.
A move in a ruin-and-recreate neighborhood is described by applying a ruin method and then applying a recreate method. The ruin method randomly removes parts of a solution or unassigns variables, depending on the structure of the solution. Then, given the resulting partial solution the recreate method applies a construction heuristic to repair the solution. If the removed parts of the solution were not optimal, the construction heuristic may find an improved solution in which case the search continuous from the improved solution. If the solution could not get improved another ruin and recreate move is applied until we find one which improves the solution or a termination criterion is satisfied.
Example 5.2. 1. Continuing example 5.1.1 for the TSP we can define a ruin method by removing a random sub-path from the tour. The length of the sub-path is randomly selected out of a predefined range and then the first removed city is randomly selected.
To repair now the partial tour we can apply for example the best fit insertion heuristic which inserts cities at the position in the partial tour where they fit best.
If we consider not just one large neighborhood but many ruin-and-recreate based large neighborhoods, we can use a variable neighborhood search technique to use them all in one algorithm. Given ruin-and-recreate based large neighborhoods N 1 , . . . , N k we can use them in a RVNS framework as described in Algorithm 8 in Section 5. 1. It is important to note that ruin-and-recreate based large neighborhoods always generate random moves which fits into the framework of RVNS.

Solution Representation
Before we describe the algorithmic details for solving the K2DCSPV we specify how a solution gets represented. In this section we describe how a solution gets represented in the approach by Dusberger and Raidl [DR14].  One important restriction which changes the nature of the problem a lot is the restriction of only allowing guillotine cuts. Every guillotine cut cuts a pattern into two halves. This fact can be used to represent a pattern naturally as a binary tree, where each inner node represents one cut. Having a discrete representation of a pattern like a tree helps a lot to deal with solutions. Furthermore, cuts which only cut off some waste from one element do not need to be part of the cutting tree. With that simplification we can enforce that the leafs of the cutting tree can each be associated with an element. Moreover, we do not need to store any positions of the cuts. Since every leaf node represents an element, we can recursively calculate where we have to place cuts starting from the leaf nodes.

Solution Representation
We can transform the binary tree into a tree by storing parallel cuts on the same level. That implies that each level of the tree either only contains vertical cuts or horizontal cuts.
Formally a solution of the K2DCSPV is represented by only one tree, where the children of the root node represent the root nodes of the cutting trees for the respective pattern. There are three types of inner nodes, the root node of the whole tree, vertical compounds and horizontal compounds. Leaf nodes are always associated with an element type. All nodes except the root node, regardless if inner node or leaf node, store a given amount a ∈ N \ {0}. This helps a lot to scale the solution representation and also the algorithms which work on the representation, since duplicate patterns or subpatterns get stored only once with a higher amount. Therefore, an algorithm can improve for example a subpattern and this gets automatically applied to all copies of this subpattern. See Figure 5.1 for an example of a cutting tree for a solution which consists of only one pattern.

Solving the K-staged Two-Dimensional Cutting Stock Problem with Variable Sheet Size
As already mentioned we do not need to specify where the cuts have to be placed, since the cutting tree already induces a minimal cutting position for each cut. To calculate this cutting position we calculate and store for each node except for the root node its width w and its height h. Whenever the tree gets modified, we need to adjust those values.
The width and the height of a leaf node is simply the width w i and the height h i of the associated element type i. Furthermore, for a horizontal component C h with children C 1 , . . . , C the width can be calculated by where w(C i ) is the width of child C i and a(C i ) the amount of child C i . The height of C h is simply the maximum of all heights of the children C 1 , . . . , C .
For a vertical component C v with children C 1 , . . . , C the calculations are analogue, the height is calculated by and the width is the maximum of the widths of the children. Based on the widths or heights of a components children one can easily calculate where to place the real cuts.

Waste Rectangles
Additional to the width and height we can define so called waste rectangles for all compounds of a cutting tree. A waste rectangle of a compound basically describes the largest rectangle which could get added as a child to the compound such that the sheet of the compound (the level 1 ancestor node of the compound node) still fits into the sheet type used for this sheet.
Formally, we first define a maximum width w max P , a maximum height h max P , a slack width w P , and a slack heighth P for each pattern P (all nodes, except the root node) in the cutting tree. Note that each pattern P with width w P and height h P is part of a sheet which is associated with a sheet type t ∈ T with a height H t and a width W t . The maximum width and height of a pattern describes the maximum size of the pattern such that the containing sheet pattern would still fit into its sheet type of size (H t , W t ). The slack height and slack width of a pattern P is simply the difference of the maximum width and height to the actual width and height, i.e.
We define the maximum height h max P and maximum width w max P recursively depending on type of the containing parent compound C and its slack heighth C and slack widthw C : if P is a level 1 node, i.e. a sheet pattern.

Objective value
As in the problem definition of K2DCSPV stated, the main objective is to minimize the costs of a solution. But to be able to guide the search and to tie-break different solutions with the same costs we introduce additional secondary objectives. The idea is that if two solutions have the same cost and possibly use exactly the same sheet types we want to favor the one solution which has a pattern which is almost empty. We want to favor such a solution since we hope to be able to remove an almost empty pattern by improving our solution and therefore reducing the costs. Clearly we cannot just sum over the wastes since if both solutions contain all elements and use the same sheet types then the sum of wastes will be the same. By squaring the waste ratios of each pattern we favor solutions where the wastes are not distributed across multiple sheets but concentrate on few or even only one sheet. We therefore introduce as secondary objective the sum of squared waste ratios where w(P ) denotes the waste ratio of the pattern P . For a pattern P the waste ratio is defined by where H t and W t are the dimensions of the used sheet type t and e P i is the element containment vector of the pattern P . Note that we squared waste ratios by the costs of the associated sheets and divide the whole sum of weighted squared waste ratios by the cost of the solution so that we get a value between 0 and 1. We want either to maximize the sum of squared waste ratios c 2 (s) or to minimize 1 − c 2 (s), which is still between 0 and 1. We can use this fact to scale the objective such that it is always minor to the main objective, the costs of the solution. As scaling factor, we use the minimum cost of all sheet types. Therefore, we always prioritize a solution with one sheet less, regardless of the waste ratios.
All in all we get as total objective

Ruin Methods
In this section we will present different ruin methods used in the approach of Dusberger and Raidl [DR14]. They are then combined with different recreate methods, which we will present in the next section, and embedded in a variable neighborhood framework.
Each ruin method removes some nodes from the cutting tree. How to select the nodes to remove depends on the different ruin methods. All ruin methods have a parameter for the number of decrements δ or a percentage π which indirectly specifies δ by multiplying π with the total amount of possible nodes to remove, considering also the amounts of the nodes.

Ruin Random Subtree
This simple form has two variants, either it targets sheet patterns for removal or it targets leaf nodes for removal. We call the first variant ruin random sheet and the second ruin random element. In both variants the nodes for removal are selected uniformly random out of the pool of target nodes. Nodes can be selected multiple times but not more often than their amount value a. For each selection of a node its amount gets reduced by one and if it gets zero the node gets removed from the pattern tree.

Ruin by Maximum Waste Ratio
This ruin method is applied to sheet patterns. As defined in Section 5.4 we can associate with each sheet pattern P a waste ratio w(P ), see (5.3). The waste ratio basically tells us how dense the sheet type is filled with elements. The smaller the waste ratio the denser the sheet type is filled. Ideally we would like to remove patterns with high waste ratios since they are not filled densely, but we want to do that in a randomized way.
To do that we order all sheet patterns, i.e. level 1 nodes, by non-increasing waste ratio.
To select one of the sheet patterns for removal we apply Algorithm 9. It uses a parameter With probability 1 − (1 − ruin p ) a P return P 3: end for 4: return pattern P with the highest waste ratio ruin p which determines the probability that the next pattern gets selected. After selecting a sheet pattern reduce its amount by one and remove it from the tree if the amount is zero. Repeat that until δ sheet patterns got removed.

Ruin and Merge
The idea of ruin and merge is to increase the amount of good patterns, i.e. patterns with low waste ratios. First a sheet pattern with a low waste ratio gets selected in a randomized way: We shuffle all sheet patterns, select the first one and iterate through the others in the shuffled order. Whenever we find a sheet pattern with a smaller waste ratio as the current, we set it to the current with a probability of 75%. The sheet pattern P 0 resulting from this procedure is then considered fixed.
In the next step we compute the total demand d tot i of all other sheet patterns, assume that S is the set of all sheet patterns of the current solution: Then the maximum value x for increasing the amount of P 0 gets calculated by We increase a P 0 by x, i.e. a P 0 ← a P 0 + x and remove enough other sheet patterns from S to do not overproduce any elements.

Construction Methods
In this section we will give an overview of the construction methods by Dusberger and Raidl. Since going too much into detail is out of scope for this work, an interested reader may read details in [DR14,DR15,DR17]. Construction methods are used for constructing a starting solution at the beginning and also for repairing partial solutions. We will present in this section different fill methods, each of them gets a partial solution S and tries to add elements to the patterns of the partial solution, i.e. tries to fill the partial solution. But they never add new sheets, they only fill the already existing patterns in S.
We then use such a fill method within a beam search framework where in each level of the search tree one new empty sheet gets added. After adding the sheet the fill method is applied for filling everything. We do that for each possible sheet type t, therefore the selection which sheet type to use next is not done greedily but in a sophisticated beam search approach. See Algorithm 5 for a generic overview how beam search works. Let us call the used fill method fill, then the whole construction algorithm is given by Algorithm 10.
In the following we will present the different fill methods.

Fill Methods
All fill methods except for the dynamic programming approach have the same structure.
In each iteration they insert a grid of elements of one type into a waste rectangle. The for S ∈ S do 5: for t ∈ T do 6: if P ∈P S :t P =t a S P < q t then 7: Clone S to S and add an empty pattern of sheet type t to S If any S is complete store the best objective o best and its solution

16:
Remove all partial solutions from S which cannot be better than o best 17: end while 18: return the best found complete solution used element type must be in the set where d r,S i is the residual demand of the element type i in the partial solution S.
How they decide, which element type in E S R they use, which waste rectangle they use, and how big the grid is depends on the method. They stop when no element could get inserted anymore.

Critical Fit Insertion Heuristic
In each iteration the critical element type is calculated. The critical element type i S c ∈ E S R is defined by the properties • the number of sheets in S which contain a waste rectangle into that the element i S c fits is minimal, • if multiple element types satisfy the above conditions, use the one with the smallest index.

Construction Methods
We now fix the critical element type i := i S c and search for a waste rectangle W R of a compound C and a grid size a vert i × a hor i with a vert i · a hor i ≤ d r,S i , such that the insertion of the grid into the waste rectangle W R results in a maximal fitness. The fitness of an insertion with the given grid size into the compound C, resulting in a compound C , is defined by whereh C is the slack height andw C the slack width of the new compound C after inserting the grid. Furthermore, η(C, a vert i , a hor i ) is defined by The value of η tells us how well the grid fits to the other subpatterns of C, since we want for a horizontal compound that all subpatterns have almost the same height and for a vertical compound that all subpatterns have almost the same width. When we found an optimal waste rectangle and grid size according to the fitness criterion we insert the grid into this waste rectangle and update all data structures (like E S R , i S c , . ..).

Fill Heuristic Based on Average-Area Sufficiency
We define the average areaā(X) of a multiset of elements X by the average area of all elements in X (counted with their occurrence in the multiset). Furthermore, a pattern, a grid of elements and E S R can be interpreted as multisets of elements. For E S R the element counts are according to the residual demands.
If we want to insert a grid G into a waste rectangle, which is part of a sheet pattern P , then we define the average-area sufficiency criterion for this insertion bȳ where the unions are given by interpreting everything as multisets of elements.
If we want to compare two grid insertions, we distinguish the following: • If both insertions satisfy the criterion, then the one yielding less waste is better • If only one satisfies the criterion, the one which satisfies it is better • If both do not satisfy the criterion, the one which violates the criterion the least is better Now in each iteration we use the waste rectangle, element type, and grid size which is the best according to the above comparison criterion.

Naive Fill Heuristic
This fast heuristic implements a simple first-fit approach. It chooses the element type in E S R which has the largest area, the first waste rectangle in post order where the chosen element fits into and computes the maximum grid size for this element type and waste rectangle.

Dynamic Programming
Dynamic programming is a well-known technique for computing a recursion, where certain subproblems occur multiple times in different branches of the recursion tree. The idea is to store the results of common subproblems and looking up the value in the storage table instead of recomputing it. We want to use the dynamic programming technique in our case to develop an exact algorithm, which fills a waste rectangle with elements such that the filled area is maximal. We apply then that dynamic program to the largest waste rectangle of each sheet pattern. Note that the most important case is, when a sheet pattern is empty and therefore only has one waste rectangle of the size of the sheet type.
To be able to formulate a recursive formula for our problem, we need to lift the demands restriction, i.e. that all elements can be placed on the sheet an unlimited number of times. If the result overproduces some elements, we can remove those elements from the pattern afterwards and can apply another fill method to fill the remaining space. Furthermore, we need to discretize the possible cutting positions, for further details see [DR15]. The discretization results in a finite set P of possible cutting points for heights and a finite set Q of possible cutting points for widths.
We use variables V v k (x, y) and V h k (x, y) for (x, y) ∈ Q × P and k ∈ {0, . . . , K}. The variables V v k (x, y)(V h k (x, y)) represents the area of the optimal cutting pattern with width at most x and height at most y which uses at most k cutting levels and starts with a horizontal(vertical) cut, i.e. the root compound of the cutting tree is a vertical(horizontal) compound.
The idea of the recursion is now that a pattern with a horizontal compound as root node and cutting level k can either have one child or multiple children. If it has one child, then this is a vertical compound with one cutting level k − 1. Otherwise, we can split off the last child and get on the left side a horizontal compound with cutting level k and on the right side a vertical compound with cutting level k − 1. Vertical root compounds can be handled analogously, and we get all in all for all k > 0 and (x, y) ∈ Q × P the recursive formulation Random Sheet π ∈ U(0.05, 0.33) Critical Fit N 4

Variable Neighborhood Search for Solving K2DCSPV
Random Element π ∈ U(0.05, 0.33) Critical Fit N 5 Random Sheet π ∈ U(0.05, 0.33) Naive Fill N 6 Random Sheet π ∈ U(0.05, 0.33) Dynamic Programming N 7 Ruin and Merge − Dynamic Programming For k = 0 and (x, y) ∈ Q × P we get With that recursion we can apply a dynamic program which is implemented as a bottom up approach. Clearly this approach is exponential in running time but for many small sheet patterns it is enough fast in practice and returns good sheet patterns. If the amount of discretization points is too large, we turn off all neighborhoods which use this fill method.

Variable Neighborhood Search for Solving K2DCSPV
In the previous two sections we described ruin methods and construction methods. In this section we give an overview how we combine them to get our ruin and recreate neighborhoods for our variable neighborhood search procedure. Additionally, we introduce some other neighborhoods, which are not based on ruin and recreate but also used for our algorithm. Finally, we give a total overview of the algorithm, including how to construct initial solutions. Table 5.1 lists all ruin and recreate neighborhoods used in the approach by Dusberger and Raidl. As we can see for all neighborhoods which use the parameter π, the parameter is chosen uniformly randomly from the interval [0.05, 0.33]. Note that π is chosen for every ruin call independently and stays never fixed. Furthermore, for the construction methods of the neighborhoods we always use a beam width of β = 1, which means that we decide greedily which sheet type we use next.
Additionally to the seven ruin and recreate neighborhoods, the approach by Dusberger and Raidl uses three restructuring neighborhoods which do not change the objective of the solution but may improve the solution representation. We will only shortly list those three neighborhoods as they are not the main component of the algorithm.

Solving the K-staged Two-Dimensional Cutting Stock Problem with Variable Sheet Size
• N 8 : Tries to simplify patterns by using three different rules, merging subcompounds, reduce compounds with only one child, and rotating a sheet.
• N 9 : Brings patterns into a normal form and recognizes sibling patterns of the same structure and unifies them by increasing the amount.
• N 10 : Merges equivalent sheets, which are sheets which use the same sheet type and contain exactly the same elements but with different pattern structures.
Now we have described the ten used neighborhoods in the variable neighborhood structure, what remains is how to construct the initial solution. For that we also use the construction methods, but this time we also use larger beam widths than 1, depending on the fill method. We run four different construction methods all starting from the empty partial solution and chose the best result as starting solution.
• Method C 1 uses the critical fit insertion heuristic with beam width β = 10.
• Method C 2 uses the fill heuristic based on average-area sufficiency with beam width β = 10.
• Method C 3 also uses the fill heuristic based on average-area sufficiency with beam width β = 10, but this time we replace areas with a so called value-correction framework. This framework assigns each element a value instead of an area. Then it iteratively tries to improve the values such that the solution quality increases. At the beginning the values equal the areas of the elements.
• Method C 4 uses the dynamic programming fill method with beam width β = 1.
Now we presented everything to describe the whole algorithm, which is done in Algorithm 11.

Considering Pattern Setup Costs
We are especially interested in a variant of the K2DCSPV which uses pattern setup costs. The algorithm by Dusberger and Raidl which we presented in this chapter is mainly designed for the K2DCSPV and does not consider pattern setup costs. Although we can always incorporate pattern setup costs in the solution objective, see Section 5.4, the solution quality is still not as good as without setup costs. This is also one of the reasons why we were interested in solving the CSSCP or CSSSCPE as a post processing in a first place, since the approaches we presented in Chapter 4 specifically consider pattern setup costs and therefore can improve the solution quality often as we will see in the computational results chapter.
In the following we state the whole problem formulation, which was already presented at the end of Section 2.2. Apply construction method C i 3: end for 4: S ← best solution found by the four construction methods 5: i ← 1 6: while termination criterion is not satisfied do . Given a set of sheet types T with widths W t ∈ R + , heights H t ∈ R + , available quantities q t ∈ N ∪ {∞}, costs c t ∈ R + , and stacking costs c S t for t ∈ T . Furthermore, let s max ∈ N ∪ {∞} be the maximum stacking size and E be a set of different element types. Each element type i ∈ E has a width w i ∈ R + , a height h i ∈ R + , and a demand d i ∈ N \ {0}. A solution S to the problem is now a set of patterns P S and for each pattern P ∈ P S an amount a S P . Furthermore, each pattern P ∈ P S is associated with a sheet type t P . Each pattern describes how to cut output elements out of the associated sheet type only using guillotine cuts. We can associate with each pattern P ∈ P S an element vector (e P i ) i∈E ∈ N |E| which describes how often the i-th element occurs in the pattern P . A solution is feasible if all element demands are satisfied, i.e.

Considering Pattern Setup Costs
and all available sheet quantities are not exceeded, i.e. P ∈P S :t P =t a S P ≤ q t ∀t ∈ T.

Solving the K-staged Two-Dimensional Cutting Stock Problem with Variable Sheet Size
The problem is now to find a feasible solution which minimizes the costs To solve this problem we can simply use the approach by Dusberger and Raidl and adapt the objective function accordingly. To improve the approach in this situation we propose to add a neighborhood N 1 1 which uses the hybrid method presented in Section 4.5 for finding a neighbor. This special neighborhood does not try to improve the current solution, but tries to build a new solution using all patterns found so far as pattern set P. Furthermore, as construction method, we can use any construction method presented in this chapter in Section 5. 6.

Computational Results
In this chapter we present computational results from applying our techniques to real world instances. We use 192 instances for K2DCSPVSC originating from real world applications for many use cases. The algorithm by Dusberger and Raidl described in Chapter 5 is used to generate instances for CSSCP and CSSSCPE. Afterwards we present a comparison of our approaches on those generated instances. Furthermore, we compare the performance of the algorithm by Dusberger and Raidl with the algorithm when we add our hybrid neighborhood.
The implementations are done in C++ and we use Gurobi 8.1 to solve our ILP approach. All tests were performed on a single core of an Intel Xeon E5-2640 v4 processor with 2.4GHz using at most 8GB RAM.

Instances
In this section we present the instances we will use for our tests. As instances for the original problem, K2DCSPVSC, we use 192 real world instances which got provided by LodeStar Technology. The instances describe heterogeneous situations arising in different applications. Most instances have only one sheet type, but there are instances with up to 145 different sheet types. Furthermore, the number of different element types ranges from 1 to 176. And if we sum up all element demands the demand sum ranges from 1 to 8511. The instances with demand sum equal to one are trivial to solve but are still part of our testing portfolio.

Generating Instances for CSSCP and CSSSCPE
We used the algorithm by Dusberger and Raidl described in Chapter 5 to generate instances for CSSCP and CSSSCPE. For each of the 576 real world instances we generate one instance for CSSCP and CSSSCPE. We do this by applying the algorithm by Dusberger and Raidl on each instance with a time limit of one hour and collect all patterns occurring during the execution of the algorithm. The resulting pattern sets P range from 1 pattern to 1,384,811 patterns.
As expected the variants "M" and "H" generate in general fewer patterns than the variant "N", since pattern setup costs lead to patterns being used many times instead of many different patterns. We group the instances for each variant into ten groups based on the size of P. Table 6.1 shows the size limits of P for each group and how many instances are part of the group for each variant.
Note that we generated these instances by running the algorithm once. Since this is a randomized algorithm, running it multiple times would potentially result in completely different sets P.

Comparing Solution Approaches for CSSCP and CSSSCPE
In this section we will present a comparison of the performance of the approaches presented in Chapter 3 on the different instance groups as described in the previous section.
All test runs have a time limit of one hour after which the algorithm returns without a feasible solution. Note that all our algorithms are deterministic and therefore we only run every test only once. Since some algorithms are quite time-consuming and the number where S is in this case the empty partial solution. In the following we will analyze the performances of the algorithms when filtering the pattern set by |P| ≤ X for different values for X, which means that only the best X patterns remain in the pattern set P according to the rating r(P ). Table 6.2 shows the results when we use relatively small pattern sets |P| ≤ X := 500 for the problem CSSCP. It compares the ILP approach with the Greedy approach and the hybrid approach. Column "G" gives the group index of the instance set, column "V" the variant abbreviation and "#" the number of instances in this set. Furthermore, for each of the three algorithms ILP, Greedy, and the Hybrid, the columns "feas." represent the number of instances in the set for which the respective algorithm returned a feasible solution. Moreover, the columns "obj." represent the geometric mean of the relative objectives of the solutions, which is the objective of the solution computed by the set cover algorithm divided through the objective of the solution computed by the original algorithm of Dusberger and Raidl, when the set cover instance got created. Therefore, a value greater than 1.0 represents a worse solution and a value smaller than 1.0 represents a better solution. We use here the geometric mean, since the values are all relative and therefore a geometric mean represents the average percentage gain/loss compared to the original solutions. Last but not least the columns "t(s)" represent the median CPU running times over the instance set in seconds.

Comparing ILP with the Greedy Appraoch and the Hybrid Approach
Note that the Hybrid does not solve the CSSCP formally as the other two algorithms do. Therefore, the direct comparison is not fair, since the hybrid is allowed to add any feasible pattern compared to the other two algorithms, which only add patterns from P. Nevertheless we included the hybrid results in the table since it shows how much the greedy can be improved by hybridizing by a construction heuristic.
As we can see, the results of the greedy algorithm are worse than the ILP results, regardless of the instance size. The ILP can improve some instances compared to the original result, even in variant "N" without setup costs, although only few instances could be improved. For variant "M" and "H" with setup costs significantly more instances could be improved by the ILP approach. We also see that there are many instances for which the ILP and the greedy couldn't find a feasible solution. This is because of the filtering, which removes too many patterns, especially for large instances, possibly removing all patterns containing some element types.
The hybrid can compensate those feasibility problems by creating new patterns. It is able to solve all instances except for one in G 9 variant "H" feasibly. Furthermore, the Note that the running times of the algorithms itself are neglectable for this case, since the bottleneck for the larger instances is the filtering. This is why all running times look similar, regardless which algorithm is used.

Computational Results
If we lift the filter condition, the ILP has problems to solve the larger instances to optimality within the given time limit. Although the running times of the greedy and the hybrid do not change that much. Table 6.3 shows the results for the ILP, greedy and hybrid without a filter.
Although the ILP reaches the time limit often for the large instances it still can solve many more instances feasibly compared to having a filter of |P| ≤ X = 500. Furthermore, we also see that the geometric means of the ILP solutions, and also of the greedy and hybrid solutions, are clearly better than in the other case.
The situation looks similar if we only allow exact demands. In this case all three algorithms solve different problems. The ILP solves the CSSCPE, the Greedy solves the CSSSCPE and the Hybrid solves the original problem K2DCSPVSC. This gives the Greedy approach more flexibility compared to the ILP approach. Nevertheless, the ILP outperforms the Greedy also in this situation. Table 6.4 shows the results of the ILP, Greedy, and Hybrid for the unfiltered case with exact demands. Note that for variant "N" without setup costs the results are similar to the results with no exact demands. This is because without setup costs every solution with no exact demands can easily be transformed into a solution with exact demands, by removing overproduced elements, without decreasing the objective. Therefore, having no exact demands is no advantage. For the variants "M" and "H" the solution qualities of the algorithms are worse than with no exact demands, which is because they are now more restricted in what solutions are allowed. Nevertheless, the ILP can find improvements in many cases and the hybrid outperforms again the ILP on the large instance sets.

Tuning the β parameter for the PILOT and Beam Search Approach
Before we compare the ILP, greedy, and hybrid with the PILOT and Beam Search approach, we want to analyze the influence of the parameter β on their performances and find the best parameter setting. As we saw already for the ILP, Greedy, and Hybrid, filtering the patterns in a preprocessing does decrease the solution quality significantly. Therefore, we focus for the pilot and the beam search approach on the unfiltered instances.
We start with the case of no exact demands. Figure 6.1 compares for each of the variants "N", "M", and "H" the geometric mean of the objectives, the number of feasible solutions, the number of improved solutions, and the running times for three different values β = 10, 30, 100. As we can see the geometric means of the objectives are closely together, regardless of the β value. The larger β value works well on medium-sized instances and the smaller β values on larger instances. For the small instances the algorithm often finds  To verify observation we applied a Wilcoxon signed-rank test with p-value of 5%. All instance combinations using exact and not exact demands, variants "N", "M" and "H", but fix no filtering, are grouped together by the instance groups. Then on each instance group we compare the performance of each pair of the three beta-values. First of all the value β = 3 is significantly worse than β = 10 for the instance groups G 2 to G 9 and for the other two groups there is no significant difference. Because of that we won't consider β = 3 in the following and only compare β = 10 to β = 30 and β = 100. On the instance groups G 1 , G 2 , and G 3 none of the three β-values is significantly better than the other one. For group G 4 the values β = 30 and β = 100 are significantly better than β = 10. Furthermore, for group G 5 the value β = 100 is significantly better than β = 10 and for G 6 the value β = 100 is significantly better than β = 30. For group G 7 , the value β = 10 is significantly better than β = 100, and for group G 8 both, β = 10 and β = 30 are significantly better than β = 100. Last but not least for the groups G 9 and G 10 the value β = 10 is significantly better than β = 30 which in turn is significantly better than β = 100.

Computational Results
To always use a good parameter, we will fix the parameter for the PILOT by β = 100 for instances with |P| ≤ 50, 000, i.e. all groups up to G 6 , and β = 10 for instances with |P| > 50, 000.  To tune the β-parameter for the beam search we proceed in the same way as for the PILOT approach. Figure 6.3 shows the results for the case of no exact demands and no filtering. As we can see the results are similar than for the PILOT approach. The large beam widths work well on medium-sized instances and the smaller beam widths well on the larger instances. Note that already for the instances of the instance group G 7 the beam search with beam width β = 100 runs into the time limit of one hour quite often. For exact demands the differences between the β-values are similar as we can see in Figure 6.4 We apply again a Wilcoxon signed-rank test with a p-value of 5% for verifying significant differences. Again the value β = 3 is significantly worse than β = 10 for instance sets G 4 , G 5 , and G 7 and is on no instance set significantly better, although, as we can see in the charts, it can solve more instances feasibly for the largest instance set G 10 . We therefore won't consider β = 3 in the following analysis. For G 1 , G 2 , and G 3 there are no significant differences between the three β-values. For G 4 the values β = 100 is significantly better than β = 30 which in turn is significantly better than β = 10. Furthermore, for G 5 the values β = 100 and β = 30 are significantly better than β = 10 and for G 6 the value β = 30 is significantly better than β = 10. For the larger instance groups G 7 , G 8 , G 9 , and G 10 we have that β = 10 is significantly better than β = 30 which in turn is significantly better than β = 100.
We will use the same optimized parameter settings as for the PILOT also for the beam

Comparing All Approaches
Finally, we are now able to compare all five approaches, the ILP, the greedy, the hybrid, the PILOT and the beam search approach. For the PILOT and beam search approach we use a β-parameter of β = 100 for instances with |P| ≤ 50, 000 and β = 10 otherwise. Table 6.5 compares the results for all five approaches for no exact demands. The column "G" describes the group index of the considered instance group, the column "V" the variant, and the column "#" the number of instances for this group. Furthermore, for each of the five algorithms the column "f." lists the number of instances for which the algorithm found a feasible solution, "obj." gives the geometric mean of the objective values, only considering feasible solutions, the column "i." specifies the number of instances for which the algorithm found a solution which is better than the best solution found by the original algorithm when creating the instance, and the column "t(s)" the median running time in seconds.
We can see that the beam search and the PILOT approach can produce closely as good solutions as the ILP and for the larger instances even better solutions. Furthermore, they are often better than the hybrid or the greedy approach except for large instances. Note that especially for variant "H" the algorithms find for all instance groups except of the trivial group G 1 improvements in over half of the instances compared to the best  solution by the algorithm of Dusberger and Raidl. Therefore, our algorithm can really be used to improve the solution quality, especially but not only when setup costs are used. Table 6.6 lists the results if we enforce exact demands. Note again that the ILP is more restricted than the other algorithms when we enforce exact demands. The results show similar properties, although not as many solutions could get improved as with no exact demands.
To compare the algorithms statistically we applied a Wilcoxon signed-rank test for each pair of algorithms grouping again all instances of each instance groups with both exact and not exact demands together. The ILP is significantly better than the greedy, PILOT, and the hybrid for the instance groups G 1 to G 6 and compared to the beam search it is significantly better on instance groups G 2 to G 5 . On the other hand for instance groups G 8 to G 10 the greedy and the hybrid are significantly better than ILP, on instance group G 7 to G 9 the pilot is significantly better than the ILP and finally the beam search is significantly better than the ILP on instance group G 7 .
The greedy is significantly worse than the pilot on all instance groups except G 10 and significantly worse than the beam search on instance groups G 1 to G 7 . Compared to the hybrid it is significantly worse on the instance groups G 2 and G 5 to G 10 . When we compare the PILOT approach with the beam search approach we get that the PILOT is significantly better for instance group G 9 and significantly worse on instance groups G 2 to G 7 . Furthermore, the PILOT and the beam search approach are significantly better    Instance Sheet Types Element Types Total Demand   1  1  30  153  2  1  10  59  3  1  20  132  4  1  15  56  5  1  7  36  6  1  64  278  7  3  21  8511  8  2  10  6660  9  3  60  86  10  1  176  522  11  1  50  2371  12  4  5  56  13  1  10  2000  14  1  41  130  15  1  97  1224  16  1  39  118  17  1  60  710  18  3  24  350  19  1  18  19  20  1  26  79 than the hybrid approach on instance groups G 1 to G 7 and significantly worse on the groups G 9 and G 10 .

Computational Results
As we can see all algorithms have their advantages and disadvantages depending on the instance sizes compared to the others. For the smaller instances the ILP performs the best and the PILOT and beam search perform better than the greedy and hybrid. On the other hand for the large instances the fast greedy and hybrid approaches beat the other approaches.

Evaluating the Performance of the Hybrid Neighborhood
In this section we want to compare how adding the hybrid neighborhood as explained at the end of Section 5.8 changes the performance of the algorithm by Dusberger and Raidl. Since the set of 192 instances for the K2DCSPVSC is diverse we don't group them together but select 20 representative instances on which we compare the two algorithm variants. Table 6.7 shows the properties of those instances. 6. 3. Evaluating the Performance of the Hybrid Neighborhood Since the algorithm by Dusberger and Raidl only considers exact demands, we also only allow exact demands, which has to be considered in the hybrid neighborhood. Since the base algorithm by Dusberger and Raidl is a randomized algorithm we run all our tests 30 times. Each run has a running time limit of one hour which is also always fully used, since time is the only termination criterion. Since the neighborhood should be fast, we use again filters to only keep at most the best X patterns during the execution. We tested the algorithm for different filters with X = 500, X = 2000, X = 10000, and X = ∞. Table 6.8 shows the results for the instances 1 to 10 and table 6.8 for the instances 11 to 20. The column "I" is the instance number, "V" the variant and "#" the number of runs. Every further column represents the average objective over all runs for the different algorithm settings.
As we can see the original algorithm works quite well for the variant "N" without setup costs. Although for the other two variants, including the hybrid neighborhood often leads to better solutions. To verify that we apply a Wilcoxon signed-rank test on all pairs of algorithm settings for each variant. With that we could verify that the original algorithm without a hybrid neighborhood performs significantly better than all other algorithm variants for variant "N", i.e. with no setup costs. On the other hand for variant "H" all other algorithm variants perform significantly better than the original and for variant "M" there is no significant difference. Furthermore, the different X values have no significant differences. Table 6.8: Performance of the neighborhood structures of the hybrid approach with different filters compared to original algorithm on instances 1 to 10 6. 3. Evaluating the Performance of the Hybrid Neighborhood

Conclusion
At the beginning of this thesis we explored the different types of cutting stock problems using the typology by Dyckhoff [Dyc90] and the extension of it by Wäscher et al [WHS07]. We presented some basic problems and also some variants of those, especially considering pattern setup costs. Despite the diversity of the different cutting stock problems we formulated a general problem GCSP which covers most of the previously presented problem variants.
Based on this GCSP we came up with a set covering problem CSSCP which receives as input a set of patterns and tries to find a solution for the GCSP which only uses a best subset of these patterns. In this way we omit the pattern construction part of the GCSP, which may be done with any problem specific solver for the underlying cutting stock problem, and only focus on the pattern selection part. We also always consider problem variants with exact demands compared to allowing overproduction.
To solve the set cover problem we propose five different approaches. The first one is an integer linear program which can solve the problem to optimality, although the running time increases exponentially for larger instances. Furthermore, we developed a greedy heuristic which rates different patterns based on the sum of area ratings of all elements on the pattern whose demand is not yet satisfied divided through the pattern costs. A sophisticated algorithm was developed for calculating the optimal amount for a pattern as fast as possible. We then used the greedy approach to extend it to a PILOT method which executes the greedy algorithm to rate an extension. Moreover, we used the greedy rating of patterns in a beam search which stores not only the best, but the best β partial solutions in each iteration. Last but not least we developed a hybrid method which uses the greedy approach in combination with a problem specific construction method which helps to improve the greedy if there are no suiting patterns left in the given instance pattern set. Note that the hybrid does not solve the CSSCP formally anymore, since it is able to construct new patterns which were not in the pattern set. Nevertheless, it is 7. Conclusion still a valuable extension to the other algorithms, especially in practice since it is fast and often can find good solutions.
To test and apply our algorithms we presented the K-staged two-dimensional cutting stock problem with variable sheet size and pattern setup costs (K2DCSPVSC) which is a problem occurring in real world applications. We presented a sophisticated algorithm by Dusberger and Raidl [DR14, DR15, DR17] based on variable neighborhood search in combination with ruin-and-recreate based very large neighborhoods. They developed different ruin methods and many construction methods based on different greedy criteria but also one based on dynamic programming. We could then use this approach to generate pattern sets for different real world instances of the K2DCSPVSC which led to a set of testing instances for the CSSCP and its variant with exact demands.
Finally, we tested our algorithms on those generated instances. We applied all five algorithms to all instances and optimized the parameters, especially the β parameters for the PILOT and the beam search, by testing different values and applying a Wilcoxon signed-rank test for identifying significant differences. We then compared all five algorithms on different instances, grouped by their pattern set size. Again, we applied a Wilcoxon signed-rank test to identify significant differences. The algorithms perform differently depending on the instance sizes. For the small instances the ILP is significantly better than the others and the PILOT and beam search are significantly better than the greedy and the hybrid. On the other hand for the larger instances the greedy and the hybrid are significantly better than the other approaches. Especially if the pattern setup costs are high compared to the other costs the algorithms were able to improve over 50% of the interesting instances, i.e. the instances consisting of at least 10 patterns.
Last but not least we also tested the effects of incorporating the hybrid method as an own neighborhood search into the variable neighborhood search by Dusberger and Raidl. If there are no pattern setup costs including the new neighborhood leads to significantly worse solutions, since the neighborhood needs a lot of time and therefore reducing the amount of iterations which can be done within the time limit. On the other hand if the setup costs are high compared to the other costs the hybrid neighborhood leads to a significant overall improvement of the solutions.
For future work it would be interesting to apply our algorithms also to other cutting stock problems and to find out if it is also possible to improve algorithms for other problems. Furthermore, it would be interesting to develop improvement based or population based metaheuristics for the set covering problem and compare them with our approaches. Another idea would be to find good methods for producing a diverse set of patterns. In this thesis we used an algorithm by Dusberger and Raidl which was optimized for finding an as good solution as possible, but finding a diverse set of good patterns may lead to completely different algorithms.