# Solving mixed integer programs arising in statistical data editing

Wednesday, July 27, 2005 - 2:45pm - 3:30pm

EE/CS 3-180

Juan-Jose Salazar-Gonzalez (University of La Laguna)

National Statistical Agencies needs automatic tools for detecting

errors in records not satisfying a set of consistency rules.

The topic is known as Statistical Data Editing, and a very

interesting Optimization Problem is the so-called Error Localization

Problem.

It concerns finding the minimum number of fields in an invalid record

such that by modifying these values the new record satisfies all the rules.

It is an NP-hard problem because the Minimum Weight Solution to Linear

Equations

is a particular case. Also the Error Localization Problem is very important

in Signal Processing (the problem is then called Basis Selection Problem),

where information must be encoded and sent, or received and decoded,

through a Telecommunication Network.

We present an exact method, based on a Benders' decomposition approach,

solving instances with up to 100 fields and 40 edits. We also produce

a variant of this approach to solve heuristically larger instances.

Some procedures of this descending search make use of the Farkas' Lemma

in Linear Programming. Computational experience on randomly

generated instances shows that the approach can deal with

instances of up to 1000 fields and 400 edits.

The generality of the consistency rules makes the Error Localization Problem

very close to a general Mixed Integer Programming. Still, there is some

structure.

Therefore, we will open a discussion to the audience on how new results in

Mixed Integer Programming could help solving Error Localization Problems,

and viceversa!

This is part of a joint research work with Jorge Riera-Ledesma, University

of La Laguna, Tenerife, Spain.

errors in records not satisfying a set of consistency rules.

The topic is known as Statistical Data Editing, and a very

interesting Optimization Problem is the so-called Error Localization

Problem.

It concerns finding the minimum number of fields in an invalid record

such that by modifying these values the new record satisfies all the rules.

It is an NP-hard problem because the Minimum Weight Solution to Linear

Equations

is a particular case. Also the Error Localization Problem is very important

in Signal Processing (the problem is then called Basis Selection Problem),

where information must be encoded and sent, or received and decoded,

through a Telecommunication Network.

We present an exact method, based on a Benders' decomposition approach,

solving instances with up to 100 fields and 40 edits. We also produce

a variant of this approach to solve heuristically larger instances.

Some procedures of this descending search make use of the Farkas' Lemma

in Linear Programming. Computational experience on randomly

generated instances shows that the approach can deal with

instances of up to 1000 fields and 400 edits.

The generality of the consistency rules makes the Error Localization Problem

very close to a general Mixed Integer Programming. Still, there is some

structure.

Therefore, we will open a discussion to the audience on how new results in

Mixed Integer Programming could help solving Error Localization Problems,

and viceversa!

This is part of a joint research work with Jorge Riera-Ledesma, University

of La Laguna, Tenerife, Spain.