Image Corpus of Malaysian PEPs
Building image corpus of politically exposed persons for facial recognition for investigations
I’ve written about using Google Photos for facial recognition before. In order for it to be more useful, an extensive corpus of images is needed, such that when given a photograph, faces will be tagged automatically.
I’ve now started a personal project, to slowly start building a corpus of images complete with metadata of description, source and licensing. When it gets more substantial and useful, it will be made available for download freely online.
Government Reports as Sources of Photos
While public figures are easier to find on-line, image searches of photos of senior public officials who avoid the limelight will not return any results if any. Information and photos of many senior public officials, including technocrats, special officers are harder to find, especially after a regime change or when word of a possible corruption scandal starts to leak out.
Colourful government annual reports are not only a good source of information for investigations, but also for photos for our facial recognition needs. We only need to extract about 5 or more photos of our PEP of interest.
In the following example, there are enough photos from one government annual report to extract a small corpus of photos of a senior public official.
If you have copies of Malaysian government agency annual reports, especially from 2012–2015 that are no longer available, please share it with Sinar Project’s Malaysian Government Document Archive project that aims to keep a searchable public digital archive of as many government reports of interest as possible.