Fair PCA

Wednesday, June 19, 2019 - 2:00pm - 2:50pm
Lind 305
Samira Samadi (Georgia Institute of Technology)
We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for different populations. We show on several real-world data sets, PCA has higher average reconstruction error on one population than the other (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has a similar number of samples from different populations. This motivates our study of dimensionality reduction techniques which maintain similar fidelity for different populations in the data. We define the notion of Fair PCA and give a polynomial-time algorithm for finding a low dimensional representation of the data which is nearly-optimal with respect to this measure. Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data. This is joint work with Uthaipon Tantipongpipat, Jamie Morgenstern, Mohit Singh, and Santosh Vempala.