Abstract
We live in the era of big data with dataset sizes growing steadily over the past decades. In addition, obtaining expert labels for all the instances is time-consuming and in many cases may not even be possible. This necessitates the development of advanced semi-supervised models that can learn from both labeled and unlabeled data points and also scale at worst linearly with the number of examples. In the context of kernel based semi-supervised models, constructing the training kernel matrix for the large training dataset is expensive and memory inefficient. This paper investigates the scalability of the recently proposed multi-class semi-supervised kernel spectral clustering model (MSSKSC) by means of random Fourier features. The proposed model maps the input data into an explicit low-dimensional feature space. Thanks to the explicit feature maps, one can then solve the MSSKSC optimization formation in the primal, making the complexity of the method linear in number of training data points. The performance of the proposed model is compared with that of recently introduced reduced kernel techniques and Nyström based MSSKSC approaches. Experimental results demonstrate the scalability, efficiency and faster training computation times of the proposed model over conventional large scale semi-supervised models on large scale real-life datasets.
Original language | English |
---|---|
Title of host publication | 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016 |
Publisher | IEEE |
ISBN (Electronic) | 9781509042401 |
DOIs | |
Publication status | Published - 9 Feb 2017 |
Event | 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016 - Athens, Greece Duration: 6 Dec 2016 → 9 Dec 2016 |
Publication series
Name | 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016 |
---|
Conference
Conference | 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016 |
---|---|
Country/Territory | Greece |
City | Athens |
Period | 6/12/16 → 9/12/16 |
Bibliographical note
Funding Information:The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/20072013) / ERC AdG A-DATADRIVE-B (290923). This paper reflects only the authors' views, the Union is not liable for any use that may be made of the contained information; Research Council KUL: GOA/10/09 MaNet, CoE PFV/10/002 (OPTEC), BIL12/11T; PhD/Postdoc grants; Flemish Government: FWO: PhD/Postdoc grants, projects: G.0377.12 (Structured systems), G.088114N (Tensor based data similarity); IWT: PhD/Postdoc grants, projects: SBO POM (100031); iMinds Medical Information Technologies SBO 2014; Belgian Federal Science Policy Office: IUAP P7/19 (DYSCO, Dynamical systems, control and optimization, 20122017). Siamak Mehrkanoon is a postdoctoral researcher at KU Leuven, Belgium. Johan Suykens is a full professor at KU Leuven, Belgium
Publisher Copyright:
© 2016 IEEE.
Funding
The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/20072013) / ERC AdG A-DATADRIVE-B (290923). This paper reflects only the authors' views, the Union is not liable for any use that may be made of the contained information; Research Council KUL: GOA/10/09 MaNet, CoE PFV/10/002 (OPTEC), BIL12/11T; PhD/Postdoc grants; Flemish Government: FWO: PhD/Postdoc grants, projects: G.0377.12 (Structured systems), G.088114N (Tensor based data similarity); IWT: PhD/Postdoc grants, projects: SBO POM (100031); iMinds Medical Information Technologies SBO 2014; Belgian Federal Science Policy Office: IUAP P7/19 (DYSCO, Dynamical systems, control and optimization, 20122017). Siamak Mehrkanoon is a postdoctoral researcher at KU Leuven, Belgium. Johan Suykens is a full professor at KU Leuven, Belgium