MULTIMEDIA TOOLS AND APPLICATIONS, vol.79, pp.26587-26604, 2020 (SCI-Expanded)
Expression recognition (ER), which has been frequently used in human-computer interaction, uses visual data such as video and static images or sensor-based data for recognizing. Facial expression recognition (FER) is a visual data based ER. Since videos have sequential images, it can be easier to recognize emotion in video signals rather than static images which consist of a single plain image. Therefore, FER on static images is a relatively tough task. Recently, deep learning methods have introduced increased success in classification problems. Accordingly, these methods are also used for FER in the literature. Data preparation and hyperparameter optimization can be utilized to increase the success of deep learning methods. With the preparation of data, the features become more pronounced. Increasing the number of training samples directly also generally affects the success rate. Tuning the hyperparameters of deep learning is another factor that increases the performance of the models. In this study, a classification method including data preparation, hyperparameter optimization, and a transfer learning aided convolutional neural network is proposed. Through the study, a new dataset, named ERUFER, was created by using static images. The newly introduced dataset ERUFER and a popular public dataset JAFFE were classified by the proposed method. To the extent of our knowledge, the best result in the literature is achieved by the proposed method for the JAFFE dataset using a 10-fold cross-validation test technique. On the other hand, a success rate with 92.56 % is achieved for the ERUFER dataset.