Projects are subdivided between “ML” and “no-ML” ones. All projects require the theoretical description, implementation, and testing of one or more algorithms for solving numerical analysis and/or optimization problems. Each project is described with:
A list (P1), (P2), … of optimization or numerical analysis problems that can be solved using the techniques we have seen in the course. For ML projects, these are the fitting (learning) problems corresponding to one or more Machine Learning models (M1), (M2) … Some details of the problems/models are clearly specified and must be closely adhered to, other details are explicitly left free to choose (possibly with restrictions).
A list (A1), (A2), … of optimization and/or numerical analysis algorithms suitable for solving the problems (P1), (P2), … (fitting problems corresponding to (M1), (M2), …) that have been seen in the course, or are at least based on the same conceptual tools. Some details of the algorithms are clearly specified and must be closely adhered to, other details are explicitly left free to choose (possibly with restrictions).
All the software developed during the project, as well as the accompanying report, should be strictly of your own making. A few snippets taken from existing software (say, the one provided by the lecturers, or from SourceForge) can be acceptable but they should really be a very minor part. Similarly, copying large parts of existing documents (even with some cosmetic changes aimed at making them superficially different from the original ones) is to be avoided at all costs: direct and explicit citations of existing material (e.g., the statements of the relevant theorems) can be made but the reference to the original source must be clear and explicit. Thus, usage of any existing optimization solvers and/or existing sophisticated linear algebra routines within the project is not allowed unless explicitly permitted by the project statement; in case of doubt contact the lecturer. Please do not make your code/report public during the development and after the conclusion of the exam.
Below, a list of “standard” projects, subdivided among ML and no-ML ones, is provided. These are intended for two-person groups, although one-person groups may be allowed, with reasons, after having gotten explicit permission from the lecturers.
Furthermore, proposals of “wildcard” projects that are different from standard projects are very welcome. Groups willing to perform wildcard projects must send a reasonably detailed proposal (possibly following the scheme used here) to both instructors, who may require any number of changes before approving, and/or not approve at all. In particular, duplicates of standard projects (exact or with minor changes) will not be approved.
Since wildcard projects may be more difficult than standard projects (because they include a larger number of problems/models and/or algorithms, or because they consider particularly “nasty” ones), groups proposing them can be formed by more than two students. Of course the number of people in the group must be appropriate to the actual workload of the project, which the lecturers retain the right to ultimately decide upon.
A possible source of wildcard projects that is seen particularly favourably (by one lecturer in particular) is to choose to implement the optimization and/or numerical analysis algorithms within the C++-17 framework SMS++. This typically requires:
implementing at least one “Block” describing the optimization problem(s), or using/adapting existing ones, and at least one “Solver” implementing the algorithm(s), or adapting/improving existing ones;
following the tight software development practices envisioned by the project, i.e., a separate git repository, extensive Doxygen-ready documentation of the interface, catering for changes in the model both in the Block and in the Solver, extended, automated and repeatable correctness testing.
Due to the native coarse-grained parallel processing capabilities of SMS++, implementations of parallel algorithms may also be considered.
(M1), (M2), … are any Machine Learning model for which learning algorithms can be implemented using the techniques we have seen in the course.
(A1), (A2), … are any algorithm (optimization or numerical analysis) suitable for the models (M1), (M2), … that have been seen in the course, or are at least based on the same conceptual tools.
(M) is a neural network with topology and activation function of your choice, provided it is differentiable; only smooth regularization (say, \(L_2\)) is allowed.
(A1) is a standard momentum descent (heavy ball) approach.
(A2) is an algorithm of the class of Conjugate Gradient methods, cf. also here.
No off-the-shelf solvers allowed.
(M) is a neural network with topology and activation function of your choice, provided it is differentiable; only smooth regularization (say, \(L_2\)) is allowed.
(A1) is a standard momentum descent (heavy ball) approach.
(A2) is an algorithm of the class of limited-memory quasi-Newton methods, cf. also here and here.
No off-the-shelf solvers allowed.
(M1) is a neural network with topology and activation function of your choice (differentiable or not), but mandatory \(L_1\) regularization.
(M2) is a standard \(L_2\) linear regression (least squares).
(A1) is a standard momentum descent (heavy ball) approach applied to (M1).
(A2) is an algorithm of the class of accelerated gradient methods, cf. also here and here, applied to (M1).
(A3) is a basic version of the direct linear least squares solver of your choice (normal equations, QR, or SVD) applied to (M2).
No off-the-shelf solvers allowed.
(M1) is a neural network with topology and activation function of your choice (differentiable or not), but mandatory \(L_1\) regularization.
(M2) is a standard \(L_2\) linear regression (least squares).
(A1) is a standard momentum descent (heavy ball) approach applied to (M1).
(A2) is an algorithm of the class of deflected subgradient methods, cf. also here, applied to (M1).
(A3) is a basic version of the direct linear least squares solver of your choice (normal equations, QR, or SVD) applied to (M2).
No off-the-shelf solvers allowed.
(M1) is a neural network with topology and activation function of your choice (differentiable or not), but mandatory \(L_1\) regularization.
(A1) is a standard momentum descent (heavy ball) approach.
(A2) is an algorithm of the class of proximal bundle methods
Use of an off-the-shelf solver for the Master Problem of the bundle method is allowed.
(M1) is a neural network with topology of your choice, but mandatory piecewise-linear activation function (of your choice); any regularization is allowed.
(M2) is a standard \(L_2\) linear regression (least squares).
(A1) is a standard momentum descent (heavy ball) approach applied to (M1).
(A2) is an algorithm of the class of deflected subgradient methods, cf. also here, applied to (M1).
(A3) is a basic version of the direct linear least squares solver of your choice (normal equations, QR, or SVD) applied to (M2).
No off-the-shelf solvers allowed.
(M1) is a neural network with topology of your choice, but mandatory piecewise-linear activation function (of your choice); any regularization is allowed.
(A1) is a standard momentum descent (heavy ball) approach.
(A2) is an algorithm of the class of level bundle methods
Use of an off-the-shelf solver for the Master Problem of the bundle method is allowed.
(M) is a SVR (support vector regression)-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is an algorithm of the class of smoothed gradient methods, cf. also here, applied to either the primal or the dual formulation of the SVR.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is an algorithm of the class of proximal bundle methods, applied to either the primal or the dual formulation of the SVR.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver for the Master Problem of the bundle method is allowed.
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is an algorithm of the class of level bundle methods, applied to either the primal or the dual formulation of the SVR.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver for the Master Problem of the bundle method is allowed.
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is an algorithm of the class of active-set methods, cf. also here, applied to either the primal or the dual formulation of the SVR.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is an algorithm of the class of interior-point methods, cf. also here, applied to either the primal or the dual (or both) formulation of the SVR.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is an algorithm of the class of gradient projection methods, cf. also here, applied to either the primal or the dual formulation of the SVR.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is a Frank-Wolfe type (conditional gradient) algorithm, applied to either the primal or the dual formulation of the SVR.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver may be allowed for the LP in the Frank-Wolfe approach only, and only after discussing possible ad-hoc approaches and convincingly arguing that they would be too complicated.
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is a dual approach, cf. constrained minimization slides, applied to either the primal or the dual formulation of the SVR, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of deflected subgradient methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is a dual approach, cf. constrained minimization slides, applied to either the primal or the dual formulation of the SVR, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of smoothed gradient methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is a dual approach, cf. constrained minimization slides, applied to either the primal or the dual formulation of the SVR, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of proximal bundle methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver for the Master Problem of the bundle method is allowed.
(M) is a SVR-type approach of your choice (in particular, with one or more kernels of your choice).
(A1) is a dual approach, cf. constrained minimization slides, applied to either the primal or the dual formulation of the SVR, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of level bundle methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver for the Master Problem of the bundle method is allowed.
(M) is so-called extreme learning, i.e., a neural network with one hidden layer, \(y = w \sigma(W_1 x)\), where the weight matrix for the hidden layer \(W_1\) is a fixed random matrix, \(\sigma(\cdot)\) is an elementwise activation function of your choice, and the output weight vector \(w\) is chosen by solving a \(L_2\) minimization problem \(\arg\min_w f(w)\), with \(f(w) = \| X w - y \|_2\).
(A1) is your own implementation of the QR factorization technique, which must obtain linear cost in the largest dimension.
(A2) is incremental QR, that is, a strategy in which you update the QR factorization after the addition of a new random feature column \(x_{n+1}\) to the feature matrix \(X\). More formally, you are required to write a function that takes as an input the factors of an already-computed factorization \(X = QR\) (where \(Q\) is stored either directly or via its Householder vectors, at your choice) of \(X\in\mathbb{R}^{m\times n}\), and uses them to compute a factorization \(\hat{X} = \hat{Q}\hat{R}\) of the matrix \(\hat{X} = [X \,|\, x_{n+1}] \in \mathbb{R}^{m\times (n+1)}\), reusing the work already done. This update should have cost lower than \(O(mn^2)\).
No off-the-shelf solvers allowed.
(M) is a so-called extreme learning, i.e., a neural network with one hidden layer, \(y = W_2 \sigma(W_1 x)\), where the weight matrix for the hidden layer \(W_1\) is a fixed random matrix, \(\sigma(\cdot)\) is an elementwise activation function of your choice, and the output weight matrix \(W_2\) is chosen by solving a linear least-squares problem with \(L_2\) regularization.
(A1) is an algorithm of the class of accelerated gradient methods, cf. also here and here, applied to (M1).
(A2) is a closed-form solution with the normal equations and your own implementation of Cholesky (or LDL) factorization.
No off-the-shelf solvers allowed.
(M1) is a so-called extreme learning, i.e., a neural network with one hidden layer, \(y = W_2 \sigma(W_1 x)\), where the weight matrix for the hidden layer \(W_1\) is a fixed random matrix, \(\sigma(\cdot)\) is an elementwise activation function of your choice, and the output weight matrix \(W_2\) is chosen by solving a linear least-squares problem, with \(L_1\) regularization.
(M2) as (M1) but with \(L_2\) regularization.
(A1) is an algorithm of the class of deflected subgradient methods, cf. also here.
(A2) is a direct linear least-squares solver of your choice (Cholesky, QR, or SVD), applied to (M2).
No off-the-shelf solvers allowed.
(M) is so-called extreme learning, i.e., a neural network with one hidden layer, \(y = W_2 \sigma(W_1 x)\), where the weight matrix for the hidden layer \(W_1\) is a fixed random matrix, \(\sigma(\cdot)\) is an elementwise activation function of your choice, and the output weight matrix \(W_2\) is chosen by solving a linear least-squares problem (with \(L_2\) regularization).
(A1) is solution of this linear least-squares problem via thin QR factorization. Your implementation must scale linearly with the largest dimension of the problem.
(A2) is an algorithm of the class of Conjugate Gradient methods.
In addition, for (A2), you are expected to vary the dimension of the hidden layer and study how accuracy and computational time vary based on this choice.
(M) is so-called extreme learning, i.e., a neural network with one hidden layer, \(y = w \sigma(W_1 x)\), where the weight matrix for the hidden layer \(W_1\) is a fixed random matrix, \(\sigma(\cdot)\) is an elementwise activation function of your choice, and the output weight vector \(w\) is chosen by solving a \(L_1\) minimization problem \(\arg\min_w f(w)\), with \(f(w) = \|X w - y \|_1\).
(A1) is iteratively reweighted least squares: i.e., an iteration where you solve at each step the \(L_2\) least-squares problem \[ w_{k+1} = \frac12 \arg \min_w f_k(w), \quad f_k(w) = \|W_k(X w - y) \|_2^2, \] where \(W_k\) is the diagonal matrix with entries \((W_k)_{ii} = |(Xw_k - y)_i|^{-1/2}.\) (You can check that in the limit when \(w_k = w_{k+1} = w_*\), \(f_k\) and \(f\) have the same gradient.) Use a threshold so that the values in \(W\) do not get too large if \((Xw_k - y)_i \approx 0\).
(A2) is an algorithm of the class of deflected subgradient methods, cf. also here.
No off-the-shelf solvers allowed.
(M) is so-called extreme learning, i.e., a neural network with one hidden layer, \(y = w \sigma(W_1 x)\), where the weight matrix for the hidden layer \(W_1\) is a fixed random matrix, \(\sigma(\cdot)\) is an elementwise activation function of your choice, and the output weight vector \(w\) is chosen by solving a \(L_1\) minimization problem \(\arg\min_w f(w)\), with \(f(w) = \|X w - y \|_1\).
(A1) is iteratively reweighted least squares: i.e., an iteration where you solve at each step the \(L_2\) least-squares problem \[ w_{k+1} = \frac12 \arg \min_w f_k(w), \quad f_k(w) = \|W_k(X w - y) \|_2^2, \] where \(W_k\) is the diagonal matrix with entries \((W_k)_{ii} = |(Xw_k - y)_i|^{-1/2}.\) (You can check that in the limit when \(w_k = w_{k+1} = w_*\), \(f_k\) and \(f\) have the same gradient.) Use a threshold so that the values in \(W\) do not get too large if \((Xw_k - y)_i \approx 0\).
(A2) an algorithm of the class of smoothed gradient methods, cf. also here.
No off-the-shelf solvers allowed.
(M) is so-called extreme learning, i.e., a neural network with one hidden layer, \(y = w \sigma(W_1 x)\), where the weight matrix for the hidden layer \(W_1\) is a fixed random matrix, \(\sigma(\cdot)\) is an elementwise activation function of your choice, and the output weight vector \(w\) is chosen by solving a least-squares problem with \(L_1\) regularization \(\arg\min_w f(w)\), with \(f(w) = \|X w - y \|_2^2 + \lambda \|w\|_1\).
(A1) is iteratively reweighted least squares: i.e., an iteration where you solve at each step the least-squares problem \[ w_{k+1} = \arg \min_w f_k(w), \quad f_k(w) = \frac12 \|X w - y \|_2^2 + \lambda \|W_k w\|_2^2 \] where \(W_k\) is the diagonal matrix with entries \((W_k)_{ii} = |(w_k)_i|^{-1/2}\). (You can check that in the limit when \(w_k = w_{k+1} = w_*\), \(f_k\) and \(f\) have the same gradient) Use a threshold so that the values in \(W\) do not get too large if \((w_k)_i \approx 0\).
(A2) is an algorithm of the class of deflected subgradient methods, cf. also here.
No off-the-shelf solvers allowed.
(M) is so-called extreme learning, i.e., a neural network with one hidden layer, \(y = w \sigma(W_1 x)\), where the weight matrix for the hidden layer \(W_1\) is a fixed random matrix, \(\sigma(\cdot)\) is an elementwise activation function of your choice, and the output weight vector \(w\) is chosen by solving a least-squares problem with \(L_1\) regularization \(\arg\min_w f(w)\), with \(f(w) = \|X w - y \|_2^2 + \lambda \|w\|_1\).
(A1) is iteratively reweighted least squares: i.e., an iteration where you solve at each step the least-squares problem \[ w_{k+1} = \arg \min_w f_k(w), \quad f_k(w) = \frac12 \|X w - y \|_2^2 + \lambda \|W_k w\|_2^2 \] where \(W_k\) is the diagonal matrix with entries \((W_k)_{ii} = |(w_k)_i|^{-1/2}\). (You can check that in the limit when \(w_k = w_{k+1} = w_*\), \(f_k\) and \(f\) have the same gradient) Use a threshold so that the values in \(W\) do not get too large if \((w_k)_i \approx 0\).
(A2) an algorithm of the class of smoothed gradient methods, cf. also here.
No off-the-shelf solvers allowed.
(P1), (P2), … are any optimization or numerical analysis problems for which solution algorithms can be implemented using the techniques we have seen in the course (if more than one problem is selected, they should be related somehow).
(A1), (A2), … are any algorithms (optimization or numerical analysis) suitable for solving the problems (P1), (P2), … that have been seen in the course, or are at least based on the same conceptual tools.
(P) is the convex quadratic program on a set \(K\) of disjoint simplices \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_{i \in I^k} x_i = 1 \;\; k \in K \,,\, x \geq 0 \, \Big\} \] where \(x \in \mathbb{R}^n\), the index sets \(I^k\) form a partition of the set \(\{1, \dots , n\}\) (i.e., \(\cup_{k \in K} I^k = \{1, \dots , n\}\), and \(I^h \cap I^k = \emptyset\) for all \(h\) and \(k\)), and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a solution algorithm based on gradient projection, be it Rosen’s or Wolfe’s version.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(P) is the convex quadratic program on a set \(K\) of disjoint simplices \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_{i \in I^k} x_i = 1 \;\; k \in K \,,\, x \geq 0 \, \Big\} \] where \(x \in \mathbb{R}^n\), the index sets \(I^k\) form a partition of the set \(\{1, \dots , n\}\) (i.e., \(\cup_{k \in K} I^k = \{1, \dots , n\}\), and \(I^h \cap I^k = \emptyset\) for all \(h\) and \(k\)), and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a solution algorithm of the class of active-set methods, cf. also here.
(A2) Is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solver is allowed, save of course for (A2).
(P) is the convex quadratic program on a set \(K\) of disjoint simplices \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_{i \in I^k} x_i = 1 \;\; k \in K \,,\, x \geq 0 \, \Big\} \] where \(x \in \mathbb{R}^n\), the index sets \(I^k\) form a partition of the set \(\{1, \dots , n\}\) (i.e., \(\cup_{k \in K} I^k = \{1, \dots , n\}\), and \(I^h \cap I^k = \emptyset\) for all \(h\) and \(k\)), and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is an algorithm of the class of interior-point methods, cf. also here.
(A2) Is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solver is allowed, save of course for (A2).
(P) is the convex quadratic program on a set \(K\) of disjoint simplices \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_{i \in I^k} x_i = 1 \;\; k \in K \,,\, x \geq 0 \, \Big\} \] where \(x \in \mathbb{R}^n\), the index sets \(I^k\) form a partition of the set \(\{1, \dots , n\}\) (i.e., \(\cup_{k \in K} I^k = \{1, \dots , n\}\), and \(I^h \cap I^k = \emptyset\) for all \(h\) and \(k\)), and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is an algorithm of the class of conditional gradient (Frank-Wolfe type) methods.
(A2) Is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solver is allowed, save of course for (A2): the LP within(A1) must be solved by an ad-hoc approach exploiting the structure of your problem.
(P) is the convex quadratic program on a set \(K\) of disjoint simplices \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_{i \in I^k} x_i = 1 \;\; k \in K \,,\, x \geq 0 \, \Big\} \] where \(x \in \mathbb{R}^n\), the index sets \(I^k\) form a partition of the set \(\{1, \dots , n\}\) (i.e., \(\cup_{k \in K} I^k = \{1, \dots , n\}\), and \(I^h \cap I^k = \emptyset\) for all \(h\) and \(k\)), and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of deflected subgradient methods, cf. also here.
(A2) Is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, not even for the solution of the Lagrangian relaxation, save of course for (A2).
(P) is the convex quadratic program on a set \(K\) of disjoint simplices \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_{i \in I^k} x_i = 1 \;\; k \in K \,,\, x \geq 0 \, \Big\} \] where \(x \in \mathbb{R}^n\), the index sets \(I^k\) form a partition of the set \(\{1, \dots , n\}\) (i.e., \(\cup_{k \in K} I^k = \{1, \dots , n\}\), and \(I^h \cap I^k = \emptyset\) for all \(h\) and \(k\)), and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a dual approach, constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of smoothed gradient methods, cf. also here.
(A2) Is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, not even for the solution of the Lagrangian relaxation, save of course for (A2).
(P) is the convex quadratic program on a set \(K\) of disjoint simplices \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_{i \in I^k} x_i = 1 \;\; k \in K \,,\, x \geq 0 \, \Big\} \] where \(x \in \mathbb{R}^n\), the index sets \(I^k\) form a partition of the set \(\{1, \dots , n\}\) (i.e., \(\cup_{k \in K} I^k = \{1, \dots , n\}\), and \(I^h \cap I^k = \emptyset\) for all \(h\) and \(k\)), and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of proximal bundle methods.
(A2) Is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver for the Master Problem of the bundle method is allowed, and of course for (A2), but not for the solution of the Lagrangian relaxation.
(P) is the convex quadratic program on a set \(K\) of disjoint simplices \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_{i \in I^k} x_i = 1 \;\; k \in K \,,\, x \geq 0 \, \Big\} \] where \(x \in \mathbb{R}^n\), the index sets \(I^k\) form a partition of the set \(\{1, \dots , n\}\) (i.e., \(\cup_{k \in K} I^k = \{1, \dots , n\}\), and \(I^h \cap I^k = \emptyset\) for all \(h\) and \(k\)), and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of level bundle methods.
(A2) Is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver for the Master Problem of the bundle method is allowed, and of course for (A2), but not for the solution of the Lagrangian relaxation.
(P) is the convex quadratic separable Min-Cost Flow Problem \[ \textstyle \min \big\{ \, x^T Q x + q x \,:\, Ex = b \,,\, 0 \leq x \leq u \, \big\} \] where \(E\in\mathbb{R}^{n\times m}\) is the node-arc incidence matrix of a directed connected graph with \(n\) nodes and \(m\) arcs, and \(Q \in \mathbb{R}^{m\times m}\) is diagonal and Positive Semidefinite (i.e., \(Q = \operatorname{diag}( Q ) \geq 0\)). You can look, e.g., here for ways to generate meaningful instances of the problem.
(A1) is a solution algorithm of the class of active-set methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solver is allowed, save of course for (A2).
(P) is the convex quadratic separable Min-Cost Flow Problem \[ \textstyle \min \big\{ \, x^T Q x + q x \,:\, Ex = b \,,\, 0 \leq x \leq u \, \big\} \] where \(E\in\mathbb{R}^{n\times m}\) is the node-arc incidence matrix of a directed connected graph with \(n\) nodes and \(m\) arcs, and \(Q\in\mathbb{R}^{m\times m}\) is diagonal and Positive Semidefinite (i.e., \(Q = \operatorname{diag}( Q ) \geq 0\)). You can look, e.g., here for ways to generate meaningful instances of the problem.
(A) is an algorithm of the class of interior-point methods, cf. also here.
No off-the-shelf solvers allowed.
(P) is the convex quadratic separable Min-Cost Flow Problem
\[ \textstyle \min \big\{ \, x^T Q x + q x \,:\, Ex = b \,,\, 0 \leq x \leq u \, \big\} \] where \(E \in \mathbb{R}^{n\times m}\) is the node-arc incidence matrix of a directed connected graph with \(n\) nodes and \(m\) arcs, and \(Q \in \mathbb{R}^{m\times m}\) is diagonal and Positive Semidefinite (i.e., \(Q = \operatorname{diag}( Q ) \geq 0\)). You can look, e.g., here for ways to generate meaningful instances of the problem.
(A1) must be a solution algorithm of the class of conditional gradient (Frank-Wolfe type) methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver is permitted only for the linear Min-Cost Flow Problem solved at each iteration (and of course for (A2)), but it has to be a specialized solver (not, say, a generic Linear Programming one).
(P) is the convex quadratic separable Min-Cost Flow Problem \[ \textstyle \min \big\{ \, x^T Q x + q x \,:\, Ex = b \,,\, 0 \leq x \leq u \, \big\} \] where \(E \in \mathbb{R}^{n\times m}\) is the node-arc incidence matrix of a directed connected graph with \(n\) nodes and \(m\) arcs, and \(Q \in \mathbb{R}^{m\times m}\) is diagonal and Positive Semidefinite (i.e., \(Q = \operatorname{diag}( Q ) \geq 0\)). You can look, e.g., here for ways to generate meaningful instances of the problem.
(A1) must be a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of deflected subgradient methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2), unless possibly for the solution of the Lagrangian relaxation but only after an analysis that convincingly shows it’s a good idea
(P) is the convex quadratic separable Min-Cost Flow Problem \[ \textstyle \min \big\{ \, x^T Q x + q x \,:\, Ex = b \,,\, 0 \leq x \leq u \, \big\} \] where \(E\in\mathbb{R}^{n\times m}\) is the node-arc incidence matrix of a directed connected graph with \(n\) nodes and \(m\) arcs, and \(Q\in\mathbb{R}^{m\times m}\) is diagonal and Positive Semidefinite (i.e., \(Q = \operatorname{diag}( Q ) \geq 0\)). You can look, e.g., here for ways to generate meaningful instances of the problem.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of smoothed gradient methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2), unless possibly for the solution of the Lagrangian relaxation but only after an analysis that convincingly shows it’s a good idea
(P) is the convex quadratic separable Min-Cost Flow Problem \[ \textstyle \min \big\{ \, x^T Q x + q x \,:\, Ex = b \,,\, 0 \leq x \leq u \, \big\} \] where \(E\in\mathbb{R}^{n\times m}\) is the node-arc incidence matrix of a directed connected graph with \(n\) nodes and \(m\) arcs, and \(Q\in\mathbb{R}^{m\times m}\) is diagonal and Positive Semidefinite (i.e., \(Q = \operatorname{diag}( Q ) \geq 0\)). You can look, e.g., here for ways to generate meaningful instances of the problem.
(A1) is a dual approach, constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of proximal bundle methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver is allowed for the Master Problem of the bundle method, of course for (A2), and possibly also for the solution of the Lagrangian relaxation but only after an analysis that convincingly shows it’s a good idea.
(P) is the convex quadratic separable Min-Cost Flow Problem \[ \textstyle \min \big\{ \, x^T Q x + q x \,:\, Ex = b \,,\, 0 \leq x \leq u \, \big\} \] where \(E\in\mathbb{R}^{n\times m}\) is the node-arc incidence matrix of a directed connected graph with \(n\) nodes and \(m\) arcs, and \(Q\in\mathbb{R}^{m\times m}\) is diagonal and Positive Semidefinite (i.e., \(Q = \operatorname{diag}( Q ) \geq 0\)). You can look, e.g., here for ways to generate meaningful instances of the problem.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of level bundle methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver is allowed for the Master Problem of the bundle method, of course for (A2), and possibly also for the solution of the Lagrangian relaxation but only after an analysis that convincingly shows it’s a good idea.
(P) is the problem of estimating the matrix norm \(\| A \|_2\) for a (possibly rectangular) matrix \(A \in \mathbb{R}^{m \times n}\), using its definition as an (unconstrained) maximum problem.
(A1) is a standard gradient descent (steepest descent) approach.
(A2) is an algorithm of the class of Conjugate Gradient methods, cf. also here.
No off-the-shelf solvers allowed.
(P) is the problem of estimating the matrix norm \(\| A \|_2\) for a (possibly rectangular) matrix \(A \in \mathbb{R}^{m \times n}\), using its definition as an (unconstrained) maximum problem.
(A1) is a standard gradient descent (steepest descent) approach.
(A2) is a quasi-Newton method such as BFGS (one which does not require any approximations of the Hessian of the function).
No off-the-shelf solvers allowed.
(P) is the linear least squares problem \[
\min_w \| \hat{X} w - y \|
\] where \(\hat{X}\) is the
matrix obtained by concatenating the (tall thin) matrix \(X\) from the ML-cup dataset by prof.
Micheli with a few additional columns containing functions of your
choice of the features of the dataset, and \(y\) is one of the corresponding output
vectors. For instance, if \(X\)
contains columns \([ x1 , x2 ]\), you
may add functions such as log(x1), x1.^2, x1.*x2
, …
(A1) is an algorithm of the class of Conjugate Gradient methods, cf. also here.
(A2) is thin QR factorization with Householder reflectors (lecture 10 here), in the variant where one does not form the matrix \(Q\), but stores the Householder vectors \(u_k\) and uses them to perform (implicitly) products with \(Q\) and \(Q^T\).
No off-the-shelf solvers allowed. In particular, you must implement yourself the thin QR factorization, and the computational cost of your implementation should scale linearly with the largest dimension of the matrix \(X\).
(P) is the linear least squares problem \[ \min_w \| \hat{X} w - \hat{y} \| \] where \[ \hat{X} = \begin{bmatrix} X^T \\ \lambda I_m \end{bmatrix}, \quad \hat{y} = \begin{bmatrix} y \\ 0 \end{bmatrix}, \] with \(X\in\mathbb{R}^{m\times n}\) the (tall thin) matrix from the ML-cup dataset by prof. Micheli, \(\lambda>0\), and \(y\) is a random vector.
(A1) is an algorithm of the class of limited-memory quasi-Newton methods, cf. also here and here.
(A2) is thin QR factorization with Householder reflectors (lecture 10 here) in the variant where one does not form the matrix \(Q\), but stores the Householder vectors \(u_k\) and uses them to perform (implicitly) products with \(Q\) and \(Q^T\).
No off-the-shelf solvers allowed. In particular you must implement yourself the thin QR factorization, and the computational cost of your implementation should be at most quadratic in \(m\).
(P) is the linear least squares problem \[ \min_w \| X w - y \| \] where \(X\) is the (tall thin) matrix from the ML-cup dataset by prof. Micheli, and \(y\) is a random vector.
(A1) is thin QR factorization with Householder reflectors (lecture 10 here)
(A2) is LDL factorization with pivoting applied to the square symmetric linear system \[ \begin{bmatrix} I & X \\ X^T & 0 \end{bmatrix} \begin{bmatrix} r \\ w \end{bmatrix} = \begin{bmatrix} y \\ 0 \end{bmatrix} \] (augmented system).
No off-the-shelf solvers allowed.
(P) is a sparse linear system of the form \[ \begin{bmatrix} D & E^T\\ E & 0 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} b\\c \end{bmatrix} \] where \(D\in\mathbb{R}^{m\times m}\) is a diagonal positive definite matrix and \(E\in\mathbb{R}^{n\times m}\) is the node-arc incidence matrix of a given directed graph. These problems arise as the KKT system of the convex quadratic separable Min-Cost Flow Problem, hence you can look, e.g., here for ways to generate meaningful instances of the problem.
(A1) is Conjugate Gradient for linear systems (lecture 38 here), applied to the “reduced” linear system \[ ( E D^{-1} E^T ) y = ED^{-1}b - c. \] (Indeed, one can see that eliminating \(x\) the two linear systems are equivalent). Your implementation should not require forming the matrix \(E D^{-1} E^T\) explicitly, but rely on a callback function which computes the product \(ED^{-1}E^T v\) for a given vector \(v\).
(A2) is GMRES (or MINRES) (lectures 35–36 here), applied to the original system.
No off-the-shelf solvers allowed.
(P) is the linear least squares problem \[ \min_w \| \hat{X} w - \hat{y} \| \] where \[ \hat{X} = \begin{bmatrix} X^T \\ \lambda I \end{bmatrix}, \quad \hat{y} = \begin{bmatrix} y \\ 0 \end{bmatrix}, \] with \(X\in\mathbb{R}^{m\times n}\) the (tall thin) matrix from the ML-cup dataset by prof. Micheli, \(\lambda >0\), and \(y\) is a random vector.
(A1) is thin QR factorization with Householder reflectors (lecture 10 here).
(A2) is a variant of thin QR (to be determined) in which you make use of the structure of the matrix \(\hat{X}\) (and in particular the zeros in the scaled identity block) to reduce the computational cost.
No off-the-shelf solvers allowed. In particular you must implement yourself the thin QR factorization, and the computational cost of your implementation should be at most quadratic in \(m\).
(P) is low-rank approximation of a matrix \(A \in \mathbb{R}^{m\times n}\), i.e., \[ \min_{U \in \mathbb{R}^{m\times k}, V\in\mathbb{R}^{n\times k}} \|A - UV^T\|_F \]
(A) is alternating optimization: first take an initial \(V = V_0\), and compute \(U_1 = \arg \min_{U} \|A - UV^T_0\|_F\), then use it to compute \(V_1 = \arg \min_{V} \|A - U_1V^T\|_F\), then \(U_2 = \arg \min_{U} \|A - UV_1^T\|_F\), and so on, until (hopefully) convergence. Use QR factorization to solve these sub-problems.
No off-the-shelf solvers allowed. In particular you must implement yourself QR factorization.
(P) is low-rank approximation of a matrix \(A \in \mathbb{R}^{m\times n}\), i.e., \[ \min_{U \in \mathbb{R}^{m\times k}, V\in\mathbb{R}^{n\times k}} \|A - UV^T\|_F \]
(A) is alternating optimization: first take an initial \(V = V_0\), and compute \(U_1 = \arg \min_{U} \|A - UV^T_0\|_F\), then use it to compute \(V_1 = \arg \min_{V} \|A - U_1V^T\|_F\), then \(U_2 = \arg \min_{U} \|A - UV_1^T\|_F\), and so on, until (hopefully) convergence. Use the normal equations with Cholesky factorization to solve these sub-problems.
No off-the-shelf solvers allowed. In particular you must implement yourself Cholesky factorization.
(P) is rank-1 approximation of an incomplete matrix: given a subset of its entries \((i_k,j_k,x_k)\), \(k=1,2,\dots,p\), find the rank-1 matrix \(A=uv^T\) that minimizes \[ \textstyle \min \sum_{k=1}^{p} (A_{i_k,j_k} - x_k)^2. \] This method can be used to solve problems of collaborative filtering, the most famous of which is the so-called Netflix prize (Do not use this dataset though, it would be too large probably.)
(A1) is any optimization method of your choice (on the vectors \(u,v\)).
No off-the-shelf solvers allowed.
(P) is finding the minimum-\(L_1\) norm solution of an underdetermined linear system, i.e., \[ \min_{\text{$x$ s.t. $Ax=y$}} \|x\|_1, \] where \(A \in \mathbb{R}^{m\times n}\) is short fat, \(n > m\).
It is known that when the problem has a sparse solution (i.e., \(y = Az\), where \(z\) is a vector with \(k \ll n\) non-zero elements), in many cases the method recovers the solution \(x=z\) (this is known as compressed sensing). Experiment with several artificial examples of varying sizes, and see how often this property holds (depending on \(k\)).
(A) is any optimization method of your choice.
No off-the-shelf solvers allowed.
(P) is a sparse linear system of the form \[ \begin{bmatrix} D & E^T\\ E & 0 \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix} = \begin{bmatrix} b\\c \end{bmatrix} \] where \(D\in\mathbb{R}^{m\times m}\) is a diagonal positive definite matrix (i.e., \(D = \operatorname{diag}( D ) > 0\)) and \(E\in\mathbb{R}^{(n-1)\times m}\) is obtained by removing the last row from the node-arc incidence matrix of a given connected directed graph. These problems arise as the KKT system of the convex quadratic separable Min-Cost Flow Problem, hence you can look, e.g., here for ways to generate meaningful instances of the problem.
(A1) is GMRES, and you must solve the internal problems \(\min \bigl\| H_n y - \|b\| e_1 \bigr\|\) by updating the QR factorization of \(H_n\) at each step: given the QR factorization of \(H_{n-1}\) computed at the previous step, apply one more orthogonal transformation to compute that of \(H_n\).
(A2) is the same GMRES, but using the so-called Schur complement preconditioner \[ P = \begin{bmatrix} D & 0\\ E & S \end{bmatrix}, \] where \(S\) is either \(S=ED^{-1}E^T\) or a sparse approximation of it (to obtain it, for instance, replace the smallest off-diagonal entries of \(S\) with zeros).
No off-the-shelf solvers allowed.
(P) is a sparse linear system of the form \[ \begin{bmatrix} D & E^T\\ E & 0 \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix} = \begin{bmatrix} b\\c \end{bmatrix} \] where \(D\in\mathbb{R}^{m\times m}\) is a diagonal positive definite matrix and \(E\in\mathbb{R}^{(n-1)\times m}\) is obtained by removing the last row from the node-arc incidence matrix of a given connected directed graph. These problems arise as the KKT system of the convex quadratic separable Min-Cost Flow Problem, hence you can look, e.g., here for ways to generate meaningful instances of the problem.
(A1) is MINRES, and you must solve the internal problems \(\min \bigl\| H_n y - \|b\| e_1 \bigr\|\) by updating the QR factorization of the tridiagonal matrix \(H_n\) at each step: given the QR factorization of \(H_{n-1}\) computed at the previous step, apply one more orthogonal transformation to compute that of \(H_n\).
(A2) is the same MINRES, but using the so-called Schur complement preconditioner \[ P = \begin{bmatrix} D &0\\ 0 & S \end{bmatrix}, \] where \(S\) is either \(S=ED^{-1}E^T\) or a sparse approximation of it (to obtain it, for instance, replace the smallest off-diagonal entries of \(S\) with zeros).
No off-the-shelf solvers allowed.
(P) is the convex quadratic optimization problem defined on the intersection of a box and an Euclidean ball, i.e., \[ \min \big\{ \, x^T Q x + q x \,:\, l \leq x \leq u \,,\, || x - c || \leq r \, \big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(l\), \(u\) and \(c\) are fixed vectors of \(\mathbb{R}^n\), \(r\) is a fixed positive real number, and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of smoothed gradient methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2), not even for the solution of the Lagrangian relaxation.
(P) is the convex quadratic optimization problem defined on the intersection of a box and an Euclidean ball, i.e., \[ \min \big\{ \, x^T Q x + q x \,:\, l \leq x \leq u \,,\, || x - c || \leq r \, \big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(l\), \(u\) and \(c\) are fixed vectors of \(\mathbb{R}^n\), \(r\) is a fixed positive real number, and \(Q\) is an \(n \times n\) positive semidefinite matrix.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of smoothed gradient methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2), not even for the solution of the Lagrangian relaxation.
(P) is the convex quadratic optimization problem defined on the intersection of a box and an Euclidean ball, i.e., \[ \min \big\{ \, x^T Q x + q x \,:\, l \leq x \leq u \,,\, || x - c || \leq r \, \big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(l\), \(u\) and \(c\) are fixed vectors of \(\mathbb{R}^n\), \(r\) is a fixed positive real number, and \(Q\) is an \(n \times n\) positive semidefinite matrix.
(A1) is a dual approach [references: constrained minimization slides] with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of bundle methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver is allowed for the Master Problem of the bundle method and for A2, but not for the solution of the Lagrangian relaxation.
(P) is the convex quadratic nonseparable knapsack problem \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_i a_i x_i \geq b \,,\, l \leq x \leq u \, \Big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(0 \leq l < u\) and \(a > 0\) are fixed vectors of \(\mathbb{R}^n\), \(b\) is a fixed positive real number, and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a solution algorithm based on gradient projection, be it Rosen’s or Wolfe’s version.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(P) is the convex quadratic nonseparable knapsack problem \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_i a_i x_i \geq b \,,\, l \leq x \leq u \, \Big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(0 \leq l < u\) and \(a > 0\) are fixed vectors of \(\mathbb{R}^n\), \(b\) is a fixed positive real number, and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a solution algorithm of the class of active-set methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(P) is the convex quadratic nonseparable knapsack problem \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_i a_i x_i \geq b \,,\, l \leq x \leq u \, \Big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(0 \leq l < u\) and \(a > 0\) are fixed vectors of \(\mathbb{R}^n\), \(b\) is a fixed positive real number, and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is an algorithm of the class of interior-point methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(P) is the convex quadratic nonseparable knapsack problem \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_i a_i x_i \geq b \,,\, l \leq x \leq u \, \Big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(0 \leq l < u\) and \(a > 0\) are fixed vectors of \(\mathbb{R}^n\), \(b\) is a fixed positive real number, and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a solution algorithm of the class of conditional gradient (Frank-Wolfe type) methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2): the LP within (A1) must be solved by an ad-hoc approach exploiting the structure of your problem.
(P) is the convex quadratic nonseparable knapsack problem \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_i a_i x_i \geq b \,,\, l \leq x \leq u \, \Big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(0 \leq l < u\) and \(a > 0\) are fixed vectors of \(\mathbb{R}^n\), \(b\) is a fixed positive real number, and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of deflected subgradient methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2), not even for the solution of the Lagrangian relaxation.
(P) is the convex quadratic nonseparable knapsack problem \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_i a_i x_i \geq b \,,\, l \leq x \leq u \, \Big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(0 \leq l < u\) and \(a > 0\) are fixed vectors of \(\mathbb{R}^n\), \(b\) is a fixed positive real number, and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of smoothed gradient methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2), not even for the solution of the Lagrangian relaxation.
(P) is the convex quadratic nonseparable knapsack problem \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_i a_i x_i \geq b \,,\, l \leq x \leq u \, \Big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(0 \leq l < u\) and \(a > 0\) are fixed vectors of \(\mathbb{R}^n\), \(b\) is a fixed positive real number, and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of proximal bundle methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver is allowed for the Master Problem of the bundle method, of course for (A2), but not for the solution of the Lagrangian relaxation.
(P) is the convex quadratic nonseparable knapsack problem \[ \textstyle \min \Big\{ \, x^T Q x + q x \,:\, \sum_i a_i x_i \geq b \,,\, l \leq x \leq u \, \Big\} \] where \(x \in \mathbb{R}^n\), \(q\), \(0 \leq l < u\) and \(a > 0\) are fixed vectors of \(\mathbb{R}^n\), \(b\) is a fixed positive real number, and \(Q\) is an \(n \times n\) Positive Semidefinite matrix.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved by an algorithm of the class of level bundle methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver is allowed for the Master Problem of the bundle method, of course for (A2), but not for the solution of the Lagrangian relaxation.
(P) is the Lasso Least Squares problem \[ \textstyle \min \Big\{ \, || A x - y ||_2^2 \,:\, || x ||_1 \leq t \, \Big\} \] where \(x \in \mathbb{R}^n\), \(A \in \mathbb{R}^{m \times n}\) with \(m > n\) and full column rank, \(y \in \mathbb{R}^m\), and \(t \in \mathbb{R}_{++}\) is a parameter.
(A1) is a solution algorithm based on gradient projection, be it Rosen’s or Wolfe’s version.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(P) is the Lasso Least Squares problem \[ \textstyle \min \Big\{ \, || A x - y ||_2^2 \,:\, || x ||_1 \leq t \, \Big\} \] where \(x \in \mathbb{R}^n\), \(A \in \mathbb{R}^{m \times n}\) with \(m > n\) and full column rank, \(y \in \mathbb{R}^m\), and \(t \in \mathbb{R}_{++}\) is a parameter.
(A1) is a solution algorithm of the class of active-set methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(P) is the Lasso Least Squares problem \[ \textstyle \min \Big\{ \, || A x - y ||_2^2 \,:\, || x ||_1 \leq t \, \Big\} \] where \(x \in \mathbb{R}^n\), \(A \in \mathbb{R}^{m \times n}\) with \(m > n\) and full column rank, \(y \in \mathbb{R}^m\), and \(t \in \mathbb{R}_{++}\) is a parameter.
(A1) is an algorithm of the class of interior-point methods, cf. also here.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2).
(P) is the Lasso Least Squares problem \[ \textstyle \min \Big\{ \, || A x - y ||_2^2 \,:\, || x ||_1 \leq t \, \Big\} \] where \(x \in \mathbb{R}^n\), \(A \in \mathbb{R}^{m \times n}\) with \(m > n\) and full column rank, \(y \in \mathbb{R}^m\), and \(t \in \mathbb{R}_{++}\) is a parameter.
(A1) is an algorithm of the class of conditional gradient (Frank-Wolfe type) methods.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2): the LP within the algorithm must be solved by an ad-hoc approach exploiting the structure of your problem.
(P) is the Lasso Least Squares problem \[ \textstyle \min \Big\{ \, || A x - y ||_2^2 \,:\, || x ||_1 \leq t \, \Big\} \] where \(x \in \mathbb{R}^n\), \(A \in \mathbb{R}^{m \times n}\) with \(m > n\) and full column rank, \(y \in \mathbb{R}^m\), and \(t \in \mathbb{R}_{++}\) is a parameter.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved either by an algorithm of the class of deflected subgradient methods, cf. also here, or by an ad-hoc method of your choice if reasonable and possible.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2), not even for the solution of the Lagrangian relaxation.
(P) is the Lasso Least Squares problem \[ \textstyle \min \Big\{ \, || A x - y ||_2^2 \,:\, || x ||_1 \leq t \, \Big\} \] where \(x \in \mathbb{R}^n\), \(A \in \mathbb{R}^{m \times n}\) with \(m > n\) and full column rank, \(y \in \mathbb{R}^m\), and \(t \in \mathbb{R}_{++}\) is a parameter.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved either by an algorithm of the class of smoothed gradient methods, cf. also here, or by an ad-hoc method of your choice if reasonable and possible.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
No off-the-shelf solvers allowed, save of course for (A2), not even for the solution of the Lagrangian relaxation.
(P) is the Lasso Least Squares problem \[ \textstyle \min \Big\{ \, || A x - y ||_2^2 \,:\, || x ||_1 \leq t \, \Big\} \] where \(x \in \mathbb{R}^n\), \(A \in \mathbb{R}^{m \times n}\) with \(m > n\) and full column rank, \(y \in \mathbb{R}^m\), and \(t \in \mathbb{R}_{++}\) is a parameter.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved either by an algorithm of the class of proximal bundle methods, or by an ad-hoc method of your choice if reasonable and possible.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver is allowed for the Master Problem of the bundle method, and of course for (A2), but not for the solution of the Lagrangian relaxation.
(P) is the Lasso Least Squares problem \[ \textstyle \min \Big\{ \, || A x - y ||_2^2 \,:\, || x ||_1 \leq t \, \Big\} \] where \(x \in \mathbb{R}^n\), \(A \in \mathbb{R}^{m \times n}\) with \(m > n\) and full column rank, \(y \in \mathbb{R}^m\), and \(t \in \mathbb{R}_{++}\) is a parameter.
(A1) is a dual approach, cf. constrained minimization slides, with appropriate choices of the constraints to be dualized, where the Lagrangian Dual is solved either by an algorithm of the class of level bundle methods, or by an ad-hoc method of your choice if reasonable and possible.
(A2) is a general-purpose solver applied to an appropriate formulation of the problem.
Use of an off-the-shelf solver is allowed for the Master Problem of the bundle method, and of course for (A2), but not for the solution of the Lagrangian relaxation.