banner



Can Deep Learning Extract Emotion Form Pictures

  • Periodical List
  • Front Psychol
  • PMC8503687

Front Psychol. 2021; 12: 759485.

Facial Expression Emotion Recognition Model Integrating Philosophy and Machine Learning Theory

Received 2021 Aug sixteen; Accepted 2021 Sep 6.

Abstract

Facial expression emotion recognition is an intuitive reflection of a person's mental state, which contains rich emotional data, and is 1 of the near important forms of interpersonal advice. It tin be used in diverse fields, including psychology. As a celebrity in ancient China, Zeng Guofan's wisdom involves facial emotion recognition techniques. His book Bing Jian summarizes eight methods on how to identify people, especially how to choose the right one, which means "look at the eyes and nose for evil and righteousness, the lips for truth and falsehood; the temperament for success and fame, the spirit for wealth and fortune; the fingers and claws for ideas, the hamstrings for setback; if you want to know his consecution, y'all tin focus on what he has said." It is said that a person's personality, mind, goodness, and badness can be showed by his confront. Even so, due to the complexity and variability of human facial expression emotion features, traditional facial expression emotion recognition technology has the disadvantages of insufficient feature extraction and susceptibility to external ecology influences. Therefore, this article proposes a novel characteristic fusion dual-channel expression recognition algorithm based on machine learning theory and philosophical thinking. Specifically, the characteristic extracted using convolutional neural network (CNN) ignores the trouble of subtle changes in facial expressions. The outset path of the proposed algorithm takes the Gabor characteristic of the ROI area equally input. In order to make full employ of the detailed features of the active facial expression emotion area, first segment the active facial expression emotion area from the original face prototype, and use the Gabor transform to excerpt the emotion features of the expanse. Focus on the detailed description of the local surface area. The 2nd path proposes an efficient channel attention network based on depth separable convolution to improve linear bottleneck structure, reduce network complexity, and prevent overfitting by designing an efficient attention module that combines the depth of the feature map with spatial data. It focuses more on extracting important features, improves emotion recognition accuracy, and outperforms the competition on the FER2013 dataset.

Keywords: facial expression, emotion recognition, philosophy, automobile learning, neural networks

Introduction

As an ancient Chinese celebrity, Zeng Guofanno wisdom involves the skill of facial emotion recognition. His book Bing Jian summarizes eight methods on how to identify people, especially how to choose the right one, which ways "wait at the eyes and nose for evil and righteousness, the lips for truth and falsehood; the temperament for success and fame, the spirit for wealth and fortune; the fingers and claws for ideas, the hamstrings for setback; if you lot want to know his consecution, you can focus on what he has said." It is said that a person's personality, mind, goodness and badness can be showed by his face. That is to say, complete bones are not as neat as peel, clean skin is non equally dignified as facial features. Eyes can reverberate a person'southward good and evil, in contrast to the people who comport indecently, the one with pure and unbiased minds has vivid eyes. Therefore, eyes are an important part of facial emotion recognition. Because facial representation (Dubuisson et al., 2002; Sariyanidi et al., 2014; Dominicus et al., 2018) is an intuitive reflection of human mental state, it contains rich emotional information (D'Aniello et al., 2018; Maydych et al., 2018; Suslow et al., 2020), and can intuitively reflect a person'south truthful thoughts. In daily human communication, we cannot only convey information through language and text, but nosotros likewise employ some movements and facial expressions to complete the advice between people, and inquiry shows that expressions and movements are oft more effective than words. Deliver primal messages. Facial expression emotion is a common form of non-verbal communication that tin can effectively communicate personal emotions and intentions. We can observe other people'south facial expressions with our eyes, and the brain volition analyze the data to determine their mental land, completing the expression and communication of emotions betwixt people. Facial expressions can give linguistic communication emotions, and facial expressions can clearly show a person's true emotions, which is more authentic than language, in the course of people's communication.

In social situations, humans will naturally express their personal emotions. An accurate understanding of each other's emotions will help build mutual agreement and trust. The expression and understanding of emotions is an essential skill for humans. We mainly convey personal emotions in 3 ways, namely language, voice and facial expressions. Scholars have plant that facial expressions are the most of import style of expressing human emotion data (Cai and Wei, 2020; Zhang et al., 2021). Facial expression information accounts for about 55 percent of the data transmitted by the experimenters, voice information for 38 percentage, and linguistic communication data accounts for only seven% of the full information. It's clear that, when compared to language and sound, facial expression information is more than of import for emotional comprehension. Naturally, researchers concentrate on facial expressions in order to gain a better agreement of human inner emotional activities.

In recent years, as computers take increasingly powerful calculating power and huge data sets continue to emerge, automobile learning algorithms (Domínguez-Jiménez et al., 2020; Zhang et al., 2020; Cai et al., 2021a) accept adult vigorously. Compared with traditional methods, the machine learning algorithm integrates the ii processes of feature extraction (Koduru et al., 2020) and nomenclature (Oberländer and Klinger, 2018), reduces the operation procedure, and can automatically extract the internal features of the sample data, has powerful feature extraction capabilities, and is related to computer vision (CV) (Schmøkel and Bossetta, 2021; Cai et al., 2021b). The performance in various competitions is very adept. Amid them, Convolutional Neural Network (CNN) (Santamaria-Granados et al., 2018; Ghosal et al., 2019; Gao et al., 2021) is 1 of the most common automobile learning algorithms, and the classification outcome of images is excellent, and the recognition accurateness rate of the ImageNet database has not been updated. Therefore, many researchers have begun to apply neural networks (Chu et al., 2021; Tong et al., 2021) to solve the recognition problem of facial expressions. Notwithstanding, the facial expression images collected in existent life are very uncontrollable, and these uncontrollability increment facial expressions. How to design the structure of the CNN to efficiently and accurately recognize facial expressions still needs continuous exploration.

Based on the higher up observations, I found that how the CNN can efficiently and accurately recognize facial expressions is the focus of this commodity. Therefore, I propose a dual-aqueduct emotion recognition algorithm. The commencement path of the proposed algorithm uses the Gabor feature of the ROI area as input. In order to fill employ of the detailed features of the active facial expression area, kickoff segment the active facial expression surface area from the original face paradigm, employ Gabor transform to extract the features of this area, and focus more on the detailed description of the local surface area. The second path proposes an efficient channel attention network based on deep separable convolution to reach linear bottleneck structure improvement, reduce network complexity and prevent overfitting, and better the accuracy of emotion recognition.

The main contributions of this paper are every bit follows:

  • (1)

    This newspaper proposes a novel characteristic fusion dual-channel expression recognition algorithm based on auto learning theory and philosophical thinking. Information technology has achieved competitive performance on the FER2013 data set and has a positive significance in promoting the recognition and employment of people.

  • (2)

    Aiming at the problem that the features extracted using CNNs ignore the subtle changes in the active areas of facial expressions, the first laissez passer of the proposed algorithm takes the Gabor features of the ROI area every bit input. In social club to brand total use of the detailed features of the agile areas of facial expressions, get-go the original face up epitome is segmented into the active surface area of expression, the Gabor transform is used to extract the features of this expanse, and the detailed clarification of the local area is focused.

  • (3)

    An efficient attention module is designed to combine the depth of the feature map with the spatial data, focus on the extraction of important features, and employ the joint loss function to brand the network take a meliorate feature discrimination outcome, reduce the difference of the same facial expressions inside the class, expand the feature spacing betwixt different facial expressions, and ensure the accurateness of classification.

The rest of this paper is arranged as follows. In section "Related Piece of work," we introduce relevant work, in section "Methodology," we describe the algorithm in this paper, and in department "Experiments and Results," we requite experiments and experimental results. Section "Conclusion" presents the inquiry conclusions of this newspaper.

Related Work

Emotion Recognition Based on Facial Expressions

The process of human being communication is inextricably linked to the fluctuation of various emotions. When people are experiencing basic emotions, their faces will display a variety of expression patterns, each with its ain set of characteristics and distribution scale. Facial expression recognition is a crucial part of human-reckoner interaction that allows computers to understand facial expressions based on human thinking. According to the processing of facial expression recognition process can exist divided into three important face up detection, feature extraction and classification module, confront detection as the key technology of face recognition (Adjabi et al., 2020; Zhang et al., 2021) with its rapid development has basic mature, which tin can effectively extracted from the original face prototype of fantabulous characteristics and the characteristics of correct classification becomes key factor affecting the recognition event. For example, Gao and Ma (2020) obtained facial expression attributes from facial images so as to predict emotional states according to facial expression changes.

Speech-Based Emotion Recognition

Linguistic communication is another fashion for human beings to limited emotions. The speech signals expressed by man beings in dissimilar emotional states have different characteristics and rules, such as speed, pitch, duration, etc. The emotion recognition method based on speech communication is to identify and gauge the emotional information of the speaker at this fourth dimension by studying and analyzing the physical characteristics of the speaker's speech in different emotional states. Ton-That and Cao (2019) applied speech communication signals to emotion recognition and achieved adept results on the vocalism emotion database. However, on the one hand, individual differences will lead to great differences in speech signals, which requires the establishment of a big phonetic database, which will bring some difficulties to recognition. On the other hand, the noisy surroundings volition touch the sound quality of speech, thus affecting the emotion recognition, so the acquisition of spoken language indicate has a loftier requirement on the surrounding environment.

Emotion Recognition Based on Physiological Signals

The footing of emotion recognition based on physiological signals is that humans will produce different responses under different stimuli. For example, physiological signals such every bit brain electricity, electrocardiogram, pulse, and peel electrical response tin all reflect emotions. Momennezhad (2018) used EEG signals for emotion recognition, extracting features from the time domain and frequency domain of EEG signals. Although the changes of physiological signals are non controlled by humans, they can near considerately reflect human emotional conditions.

Emotion Recognition Based on Gestures

People will involuntarily undergo some posture changes in different environmental states and moods, and guess human emotions based on physical information such equally the time and frequency of these posture changes, co-ordinate to gesture-based emotion recognition. Ajili et al. (2019) used human being movement analysis to place motion, then evaluated and evaluated the emotions expressed by human being move posture. All the same, the unmarried employ of human gestures for emotion recognition has certain limitations, considering many gestures do non have emotional significance or the same gestures take different emotional meanings in unlike background environments, so human gestures are unremarkably different from others. The modalities (such as expressions, speech, etc.) are combined for emotion recognition.

Expressions are the most intuitive style to convey emotions amid several ways to express man emotion information, such as facial expressions, voices, physiological signals, and gestures, and expression information is relatively like shooting fish in a barrel to obtain in most environments, so I utilize them. Homo emotional states are studied using facial expressions every bit objects.

Methodology

The Philosophical Definition of Emotion

Psychology defines emotion as: "a special grade of reflection of homo beings on objective reality is the experience of human attitudes toward whether objective things encounter human needs." Information technology can be understood from this definition that emotion is a subjective feel, subjective attitude, or subjective reflection, which belongs to the category of subjective consciousness, non the category of objective existence. Dialectical materialism believes that whatever subjective consciousness is a reflection of a person's objective existence. Emotion is a special subjective consciousness and must correspond to a special objective existence. The primal to the trouble lies in whether such a special objective existence can be found. It is not difficult to find that "whether a person'due south objective things meet people's needs" is actually a typical value judgment problem, "coming together people'south needs" is the value characteristic of things, an objective existence, "attitude" and "feel" Both are the way people recognize or reflect the value characteristics of things. In this style, the psychological definition of emotion tin can exist expressed every bit: "Emotion is the subjective reflection of people on the value characteristics of things." The objective existence corresponding to emotion should exist the value characteristic of things. From this I can get the philosophical definition of emotion: emotion is the subjective reflection of human beings on the value relationship of objective things.

Two-Aqueduct Emotion Recognition Model

This section will elaborate on the proposed dual-channel emotion recognition model from the ROI area sectionalisation and Gabor feature of the starting time path, and the efficient channel attending network of the second path.

Division of ROI Area

Let Pel and Per denote the positions of the left and right eyes, Pnorth announce the primal points of the tip of the olfactory organ, Pml and Pmr announce the central points of the left and right corners of the oral fissure, Ten ane and X 2 announce the horizontal coordinates of the left and right borders of the face, and Y i and Y 2 denote the vertical positions of the upper and lower edges, respectively, coordinate. Assuming that the position of the eyebrow area is calculated as an instance, only the height and width of the surface area demand to be calculated. The calculation equation for the height of the eyebrow area is as follows:

H eye = { | P el : y - P northward : y | 2 + | Y 2 - P er : y | , p el : y P er : y | Y ii - P el : y | + | P er : y - P n : y | 2 , P el : y P er : y

(1)

I apply Due westeye to represent the width of the eyebrow region. In club to make the extracted man eye region not only include the centre, just also the part of the information around the corner of the center, the calculation of Weye directly takes the distance between the left and right edges. The calculation equation is every bit follows:

Similarly, the adding of the position of the mouth in the face up image is as follows, using Hmouth and Wmouth to represent the height and width of the region, respectively.

H mouth = { | P northward : y - P ml : y | two + | P mr : y - Y ane | , p ml : y P mr : y | P ml : y - Y ane | + | P n : y - P mr : y | ii , P ml : y P mr : y

(3)

Finally, the rectangular clipping areas of the countenance, eye, and mouth in the face paradigm are adamant using the above adding method. The interference of non-primal parts of the face tin theoretically be reduced, and the adding cost can be reduced, past cut out these regions with large facial expression changes.

Gabor Filter

Compared with other wavelets, Gabor transform has a unique biological background. The Gabor filter is similar to the frequency and direction representation of the human visual organization in terms of frequency and direction, and tin extract local data of different frequencies, spatial positions and directions of the epitome. A special reward of Gabor filters is their invariance to scale, rotation and translation. The reason why Gaboe wavelet can be used for facial expression recognition is that when expression changes occur, the key parts of the face such as eyes, oral fissure, and eyebrows volition undergo neat changes due to muscle changes. These parts are reflected in the paradigm as gray-scale changes. Severe, the real and imaginary parts of the wavelet will fluctuate at this time, so the amplitude response of the Gabor filter in these parts volition be very obvious, so it is very suitable for extracting local features of expressions. In the field of paradigm processing, two-dimensional Gabor filtering is by and large used to process images. The kernel function of the ii-dimensional Gabor wavelet can be written as:

ψ uv ( z ) = || k uv || two σ 2 × e || k uv || 2 || z || ii two σ 2 × ( e ik uv z - eastward σ two 2 )

(5)

where u and v represent the direction and frequency of the Gabor wavelet kernel, z=(x,y) represents the position of a sure pixel in the image, σ represents the filter bandwidth, and |kuv|2two is used to compensate for the attenuation of the energy spectrum determined by the frequency.

The Gabor feature of the facial expression image can be obtained by convolving the facial expression paradigm and the Gabor wavelet kernel. Assuming that the gray value of the (x, y) indicate in the facial expression prototype is set to k, the calculation equation for the Gabor feature is as follows :

G u v ( x , y ) = I ( ten , y ) ψ u v ( x , y )

(half dozen)

where Guv (x,y) represents the Gabor feature of the extracted prototype, ψ uv (x,y) represents the kernel function of the two-dimensional Gabor wavelet, and * represents the convolution functioning.

Feature Fusion

To make full utilize of the features of the key areas of facial expressions and brand up for the lack of global representation in the local features extracted by Gabor, the features extracted by CNN and the local features of the ROI region extracted by Gabor are characteristic-fused. Simply put, characteristic fusion is to combine multiple different features extracted by different algorithms into a new characteristic with stronger characterization capabilities through a sure fusion method. The procedure of characteristic fusion is shown in Figure i.

An external file that holds a picture, illustration, etc.  Object name is fpsyg-12-759485-g001.jpg

Schematic diagram of characteristic fusion.

There are three ways of feature fusion, namely summation, product and splicing. Amongst the iii feature fusion methods, the method of splicing and fusion is uncomplicated in calculation and less computationally expensive. Therefore, this article uses splicing and fusion to compare Gabor features and CNN. Feature fusion is performed, and the local features of the ROI region extracted by Gabor and the features extracted by CNN are fused in the fully connected layer of the CNN. Suppose ii characteristic vectors with the same dimension are defined as Ten=(x i,x 2,⋯,xnorthward ) and Y = 1, then the calculation equation for characteristic stitching is as follows:

Z = ( x 1 , x 2 , , x north , y one , y 2 , , y n )

(7)

Channel Attention Model

To extract more than core expression features from the facial expression characteristic map, I introduces the ultra-lightweight attending module ECA-Net (Wang et al., 2020) to weight the attention of the improved linear bottleneck structure, and give greater weight to the core features, Which makes the network pay more attention to the core features of expressions. This structure contains only i parameter grand, only the functioning improvement information technology brings is very obvious. The main function of this module is to generate weights for each aqueduct and learn the correlation between features, simply similar humans always selectively ignore non-critical information, but instead focus on information that is useful to usa. This module The purpose is to let the network model ignore some non-core features, increment the accent on cadre features, and the module only adds a very small amount of additional parameters.

Equally shown in Figure 2, the distribution of features in the spatial dimension is compressed and extracted from a 2-dimensional matrix to a single value, and this value obtains the feature data in this space. Then through a fully connected layer to complete the channel dimension reduction, and then through the 2d fully continued layer to consummate the channel dimensionality, so as to obtain the relevant dependencies between the various channels, generate weights for each feature channel, and comprehensively obtain one of the feature channels. For the correlation between the channels, the of import channel features of facial expressions are generated with larger weights, and on the opposite, smaller weights are generated, that is, the attention mechanism is introduced. It generates channel weights through local one-dimensional convolution in high dimensions, and obtains the correlation dependency between each aqueduct. The side effect of aqueduct dimension reduction on the directly correspondence between channels and weights is avoided, and obtaining advisable cantankerous-aqueduct correlation dependencies is more efficient and accurate for establishing the channel attention machinery. The last two attention modules both multiply the weights of the generated channels to the original input feature map, and merge the features weighted by attending with the original features to complete the characteristic attention weighting in the channel infinite.

An external file that holds a picture, illustration, etc.  Object name is fpsyg-12-759485-g002.jpg

Schematic diagram of the aqueduct attention model.

Overall Construction

Figure 3 shows the overall frame diagram of this algorithm. Firstly, the characteristic extraction module is composed of two dissimilar CNN branches: The kickoff CNN branch takes the Gabor feature as the input. In society to fill up use of the features of the regions with obvious facial expression changes and rich facial information, the original face image should be preprocessed and the emotion-related ROI region should be trimmed out, so Gabor wavelet changes should be used to extract the ROI feature. Since the extracted features still have a high dimension, feature mapping processing of Gabor features is required earlier feature fusion. This CNN is composed of two convolution layers, which reduces the size of Gabor features and facilitates subsequent characteristic fusion. In the 2nd path, an efficient aqueduct attention network based on deep separable convolution is proposed to improve the linear bottleneck construction, reduce network complexity and forestall overfitting. By designing an efficient attention module, the depth of the characteristic map is combined with spatial information, focusing more on the extraction of important features, and improving the accuracy of emotion recognition. Finally, the feature classification module classifies the fused features through Softmax layer.

An external file that holds a picture, illustration, etc.  Object name is fpsyg-12-759485-g003.jpg

Schematic diagram of the overall framework of the proposed algorithm.

Experiments and Results

Experimental Setup

The research in this paper is carried out on the PC platform, and the experiment is carried out on the ubuntu18.04 operating system. The experiment is based on the Pytorch deep learning framework, and the programming language uses Python three.6.5. The hardware platform is: Intel Cadre i7-9700k CPU, 16GB retentivity, GPU is GTX1080TI, video memory is 11GB. In order to ensure the fairness of the experiment on the improved network and the comparison network, the preparation parameters used in the experiment are exactly the same. All model preparation strategies prefer the learning charge per unit attenuation strategy. The initial learning rate is 0.01 and the attenuation coefficient is 0.0001, the batch size is 128. Later on the model preparation is completed, all images in the training data set are called an epoch, and 150 epochs are gear up in the experiment. In order to optimize the network faster, I apply the Adam optimization algorithm.

Experimental Data Set

The data set used in this commodity is FER-2013. Due to the small-scale amount of data in the original facial expression information gear up, it is far from enough for information-driven deep learning, so data augmentation is a very important operation. In the network preparation phase, in order to prevent the network from overfitting, I first do a series of random transformations, including flipping, rotating, cutting, etc., and and so transform the data image size to 104 × 104 size, and then randomly cut into 96 × 96 size Image, and and so randomly rotate the image between 0 and 15° and perform horizontal mirroring functioning, and so send it to the network for grooming. In the network test stage, nosotros cut the four corners and the center of the image to obtain five 96 × 96 images, and and then perform the horizontal mirroring performance, respectively, which is equivalent to amplifying the data by 10 times. I input the amplified movie into the network for its recognition, and average the results, and finally the output classification with the highest score is the respective expression. This method tin expand the size of the information set, brand the trained network model more generalized and robust, and further improve the accuracy of recognition.

Evaluation Method

The overall accuracy charge per unit is used every bit the evaluation alphabetize of this study, and its calculation formula is as follows:

where TP represents the positive samples predicted past the model equally positive, TN represents the negative samples predicted by the model as negative, FP represents the negative samples predicted by the model as positive, and FN represents the positive samples predicted by the model as negative.

Experimental Results

In order to verify the reliability of the overall algorithm in this paper, this paper has carried out comparative experiments on the FER-2013 data set with the electric current advanced expression recognition network to evaluate the performance of the algorithm in this paper. The experimental results are shown in Table 1.

TABLE ane

Comparison of experimental results with unlike methods.

Methods Acc
InceptionV4 (Szegedy et al., 2017) 0.7080
DNNRL (Kim et al., 2016) 0.7082
ICL (Liu and Zhou, 2020) 0.7215
ABP (Liu et al., 2019) 0.7316
MobileNetV3 0.7189
Ours 0.7400

Table 1 is a comparing of the recognition rates of different methods. This article uses a lightweight network construction. When compared with MobileNetV3 and Inception, which are also lightweight networks, the accuracy of the FER-2013 data set is improved with fewer model parameters. Increased by 3.iii and 4.7%. Compared with the current mainstream methods on the FER-2013 data set, DNNRL (Kim et al., 2016) proposed combining multiple CNNs and using weighted joint decision-making methods, and ICL (Liu and Zhou, 2020) proposed clustering to obtain the heart distance of expression classes and continuously Calculation the method of difficult sample training and the method of combining bilinear pooling and attention mechanism proposed by ABP (Liu et al., 2019), the method in this newspaper has achieved superior performance and achieved a higher recognition charge per unit of 74.00%.

In order to avoid misjudgment of model performance when simply the overall recognition rate is used as the evaluation index, we conduct detailed experiments on the recognition results of each type of expression through the confusion matrix. The confusion matrix is also chosen the fault matrix. Each row represents the expression prediction label, and each column represents the bodily expression label. Using the confusion matrix can clearly find the recognition of each blazon of data, and from the recognition accurateness of each type of expression, nosotros can analyze the functioning of the network model in more detail.

Tabular array 2 is the confusion matrix of the loftier-efficiency channel attention model proposed in this newspaper for the recognition results of the FER-2013 test set. The data in bold on the diagonal line in the table represents the recognition accurateness of each type of expression correctly classified, and the remaining data are expression errors. The proportion of nomenclature, the last line is the boilerplate recognition accuracy of all expressions that are correctly classified. For instance, the neutral expression recognition accurateness in the lower correct corner of the diagonal of the confusion matrix is 0.77, which indicates that 76% of the expression samples in the expression data set are correctly predicted. Information technology can be seen that the recognition accuracy of happy and surprised expressions is loftier, with accuracy rates of 0.91 and 0.83, respectively, while the recognition accuracy of disgusted and angry expressions is low, with accuracy rates of just 0.64 and 0.66. By observing the samples, it is found that there are indeed many similarities between the facial morphology of fear and surprise, acrimony and disgust, and the number of samples is far smaller than the number of happy and surprised, which leads to the model's bereft learning of its features, and then the recognition rate is low. Finally, the average recognition rate of the model in the FER-2013 exam prepare reached 0.74.

TABLE 2

Confusion matrix of recognition rate on FER-2013 dataset.

Anger Fear Disgust Happy Sad Surprised Normal
Anger 0.66 0.01 0.eleven 0.04 0.xi 0.01 0.06
Fright 0.twenty 0.69 0.04 0.05 0.00 0.00 0.02
Cloy 0.09 0.00 0.64 0.02 0.12 0.05 0.09
Happy 0.01 0.00 0.02 0.91 0.02 0.01 0.03
Pitiful 0.05 0.00 0.11 0.05 0.68 0.00 0.13
Surprised 0.02 0.00 0.06 0.05 0.02 0.83 0.02
Normal 0.03 0.00 0.03 0.05 0.12 0.01 0.77
Acc 0.74

Ablation Experiment for Characteristic Fusion

To verify the influence of the feature fusion strategy on the performance of the proposed algorithm, an ablation experiment is set upwardly in this section, where add together represents the characteristic addition strategy, mul represents the characteristic multiplication strategy, and C represents the feature concat strategy. The results of the ablation experiment are shown in Tabular array three.

Tabular array three

Results of ablation experiments with feature fusion.

Methods Acc
Add together 0.7285
Mul 0.7195
C 0.7400

Information technology can exist seen from Tabular array 3 that the feature concat strategy has achieved the all-time results. In addition, the feature improver strategy is amend than the feature multiplication strategy. Therefore, this proves that the proposed algorithm is effective in adopting the feature concat strategy.

Ablation Experiment for Attention Model

To verify the influence of the channel attention machinery on the performance of the proposed algorithm, an ablation experiment is ready in this section. CA stands for the aqueduct attention mechanism and SA stands for the spatial attention mechanism. The results of the ablation experiment are shown in Table iv.

Table 4

Results of ablation experiments with attention.

Methods Acc
SA 0.7395
CA 0.7400

It tin can be seen from Table 4 that the channel attention mechanism has achieved better performance, which proves the superiority of CA in facial emotion recognition.

Determination

In this paper, I propose a novel feature fusion dual-channel expression recognition algorithm based on machine learning theory and emotional philosophy. Because features extracted using CNNs ignore subtle changes in the active regions of facial expressions, the proposed algorithm's first path takes the Gabor characteristic of the ROI region as input. The active facial expression region is first segmented from the original face up image, and the features of this region are extracted using Gabor transform, focusing more on the detail description of the local region, in order to brand full utilise of the particular feature of the active facial expression region. To improve the linear bottleneck structure, reduce network complexity, and avert overfitting, a channel attention network based on deep separable convolution is proposed in the 2nd path. The depth of the feature map is combined with spatial data by designing an efficient attention module, focusing more on the extraction of important features and improving the accuracy of emotion recognition. On the FER2013 data sets, competitive performance was accomplished. Furthermore, this inquiry will serve as a guide for promoting people option, and it besides confirms that Zeng Guofan'south philosophy of employing people is constructive.

In future piece of work, nosotros will investigate the feasibility of real-time face recognition, and will use the Internet of Things applied science to collect faces in real time for emotion recognition.

Author Contributions

ZS was responsible for designing the framework of the unabridged manuscript, from topic selection to solution to experimental verification.

Conflict of Involvement

The author declares that the research was conducted in the absenteeism of any commercial or financial relationships that could be construed every bit a potential disharmonize of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and practise not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may exist made past its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • Adjabi I., Ouahabi A., Benzaoui A., Taleb-Ahmed A. (2020). Past, nowadays, and future of face recognition: a review. Electronics 9:1188. ten.3390/electronics9081188 [CrossRef] [Google Scholar]
  • Ajili I., Mallem M., Didier J. Y. (2019). Human motions and emotions recognition inspired by LMA qualities. Vis. Comput. 35 1411–1426. 10.1007/s00371-018-01619-westward [CrossRef] [Google Scholar]
  • Cai Westward., Song Y., Wei Z. (2021a). Multimodal data guided spatial feature fusion and grouping strategy for East-commerce commodity demand forecasting. Mob. Inf. Syst. 2021:5568208. 10.1155/2021/5568208 [CrossRef] [Google Scholar]
  • Cai Westward., Wei Z. (2020). PiiGAN: generative adversarial networks for pluralistic image inpainting. IEEE Admission eight 48451–48463. x.1109/ACCESS.2020.2979348 [CrossRef] [Google Scholar]
  • Cai Due west., Wei Z., Song Y., Li M., Yang 10. (2021b). Residual-capsule networks with threshold convolution for partition of wheat plantation rows in UAV images. Multimed. Tools Appl. ane–17. ten.1007/s11042-021-11203-v [CrossRef] [Google Scholar]
  • Chu Z., Hu K., Chen X. (2021). Robotic grasp detection using a novel 2-stage approach. ASP Trans. Internet Things i 19–29. 10.52810/TIOT.2021.100031 [CrossRef] [Google Scholar]
  • D'Aniello B., Semin Yard. R., Alterisio A., Aria M., Scandurra A. (2018). Interspecies transmission of emotional information via chemosignals: from humans to dogs (Canis lupus familiaris). Anim. Cogn. 21 67–78. 10.1007/s10071-017-1139-10 [PubMed] [CrossRef] [Google Scholar]
  • Domínguez-Jiménez J. A., Campo-Landines Thou. C., Martínez-Santos J. C., Delahoz E. J., Contreras-Ortiz S. H. (2020). A machine learning model for emotion recognition from physiological signals. Biomed. Bespeak Procedure. Control 55:101646. 10.1016/j.bspc.2019.101646 [CrossRef] [Google Scholar]
  • Dubuisson S., Davoine F., Masson Thou. (2002). A solution for facial expression representation and recognition. Signal Procedure. Image Commun. 17 657–673. 10.1016/S0923-5965(02)00076-0 [CrossRef] [Google Scholar]
  • Gao H., Ma B. (2020). A robust improved network for facial expression recognition. Front. Signal Process. 4:4. ten.22606/fsp.2020.44001 [CrossRef] [Google Scholar]
  • Gao M., Cai Westward., Liu R. (2021). AGTH-net: attention-based graph convolution-guided 3rd-society hourglass network for sports video classification. J. Healthc. Eng. 2021:8517161. 10.1155/2021/8517161 [PMC free commodity] [PubMed] [CrossRef] [Google Scholar]
  • Ghosal D., Majumder N., Poria South., Chhaya North., Gelbukh A. (2019). Dialoguegcn: a graph convolutional neural network for emotion recognition in conversation. arXiv [Preprint]. arXiv:1908.11540 10.18653/v1/D19-1015 [CrossRef] [Google Scholar]
  • Kim B. K., Roh J., Dong Southward. Y., Lee S. Y. (2016). Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J. Multimodal User Interfaces 10 173–189. 10.1007/s12193-015-0209-0 [CrossRef] [Google Scholar]
  • Koduru A., Valiveti H. B., Budati A. K. (2020). Feature extraction algorithms to ameliorate the speech emotion recognition rate. Int. J. Speech Technol. 23 45–55. x.1007/s10772-020-09672-4 [CrossRef] [Google Scholar]
  • Liu 50., Zhang L., Jia S. (2019). "Attention bilinear pooling for fine-grained facial expression recognition," in Proceedings of the International Symposium on Net Safety and Security , (Cham: Springer; ), 535–542. 10.1007/978-3-030-37352-8_47 [CrossRef] [Google Scholar]
  • Liu X., Zhou F. (2020). Improved curriculum learning using SSM for facial expression recognition. Vis. Comput. 36 1635–1649. 10.1007/s00371-019-01759-7 [CrossRef] [Google Scholar]
  • Maydych V., Claus Grand., Watzl C., Kleinsorge T. (2018). Attention to emotional information is associated with cytokine responses to psychological stress. Front. Neurosci. 12:687. [PMC free article] [PubMed] [Google Scholar]
  • Momennezhad A. (2018). EEG-based emotion recognition utilizing wavelet coefficients. Multimed. Tools Appl. 77 27089–27106. 10.1007/s11042-018-5906-eight [CrossRef] [Google Scholar]
  • Oberländer L. A. M., Klinger R. (2018). "An analysis of annotated corpora for emotion nomenclature in text," in Proceedings of the 27th International Briefing on Computational Linguistics , (Santa Fe, NM: Association for Computational Linguistics; ), 2104–2119. [Google Scholar]
  • Santamaria-Granados 50., Munoz-Organero Grand., Ramirez-Gonzalez Grand., Abdulhay E., Arunkumar Northward. J. I. A. (2018). Using deep convolutional neural network for emotion detection on a physiological signals dataset (AMIGOS). IEEE Access vii 57–67. 10.1109/Access.2018.2883213 [CrossRef] [Google Scholar]
  • Sariyanidi East., Gunes H., Cavallaro A. (2014). Automated analysis of facial bear on: a survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37 1113–1133. 10.1109/TPAMI.2014.2366127 [PubMed] [CrossRef] [Google Scholar]
  • Schmøkel R., Bossetta M. (2021). FBAdLibrarian and Pykognition: open science tools for the drove and emotion detection of images in Facebook political ads with computer vision. J. Inf. Technol. Polit. one–11. x.1080/19331681.2021.1928579 [CrossRef] [Google Scholar]
  • Dominicus W., Zhao H., Jin Z. (2018). A complementary facial representation extracting method based on deep learning. Neurocomputing 306 246–259. 10.1016/j.neucom.2018.04.063 [CrossRef] [Google Scholar]
  • Suslow T., Hußlack A., Kersting A., Bodenschatz C. M. (2020). Attentional biases to emotional information in clinical depression: a systematic and meta-analytic review of heart tracking findings. J. Affect. Disord. 274 632–642. 10.1016/j.jad.2020.05.140 [PubMed] [CrossRef] [Google Scholar]
  • Szegedy C., Ioffe Due south., Vanhoucke 5., Alemi A. A. (2017). "Inception-v4, inception-resnet and the impact of residue connections on learning," in Proceedings of the 31st AAAI Conference on Artificial Intelligence. New York: Clan for Computing Machinery (ACM) [Google Scholar]
  • Tong Y., Yu 50., Li S., Liu J., Qin H., Li W. (2021). Polynomial fitting algorithm based on neural network. ASP Trans. Pattern Recognit. Intell. Syst. 1 32–39. 10.52810/TPRIS.2021.100019 [CrossRef] [Google Scholar]
  • Ton-That A. H., Cao North. T. (2019). Speech emotion recognition using a fuzzy approach. J. Intell. Fuzzy Syst. 36 1587–1597. 10.3233/JIFS-18594 [CrossRef] [Google Scholar]
  • Wang Q., Wu B., Zhu P., Li P., Zuo W., Hu Q. (2020). "ECA-Net: efficient channel attention for deep convolutional neural networks," in Proceedings of the 2020 IEEE CVF Conference on Calculator Vision and Pattern Recognition (CVPR) , (Piscataway, NJ: IEEE; ). 10.1109/CVPR42600.2020.01155 [CrossRef] [Google Scholar]
  • Zhang J., Yin Z., Chen P., Nichele S. (2020). Emotion recognition using multi-modal data and auto learning techniques: a tutorial and review. Inf. Fusion 59 103–126. ten.1016/j.inffus.2020.01.011 [CrossRef] [Google Scholar]
  • Zhang Fifty., Sun Fifty., Yu Fifty., Dong X., Chen J., Cai West., et al. (2021). "ARFace: attention-enlightened and regularization for face recognition with reinforcement learning," in IEEE Transactions on Biometrics, Behavior, and Identity Science , (Piscataway, NJ: IEEE; ). ten.1109/TBIOM.2021.3104014 [CrossRef] [Google Scholar]

Articles from Frontiers in Psychology are provided hither courtesy of Frontiers Media SA


Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8503687/

Posted by: princethatic.blogspot.com

0 Response to "Can Deep Learning Extract Emotion Form Pictures"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel