In order to provide users with better recommendations, it is particularly important to analyze the behavior of users tagging different resources. In this paper, an attention mechanism based on deep learning is designed to effectively capture the features related to the user’s long-term interests and current interests in the session simultaneously, and alleviate the impact of the user’s interest drift that is difficult to deal with by the current session recommendation algorithm on the recommendation accuracy. The main community discovery algorithms are applied to the clustering analysis of the label system, and their performances on different data sets are compared. Besides, we design a personalized recommendation algorithm for the label system. The experimental results show that the proposed algorithm can find the interests of different users and improve the quality of the recommendation system.
With the rapid development of the Internet, the gradual maturity and popularization of the mobile Internet and related ancillary technologies, the Internet has become an important platform for people to study, work and live. The rise of online shopping and other lifestyles has also driven a large number of e-commerce, news, The development of short video social networking sites. As people’s daily lives become more and more dependent on the convenience of the Internet, the data traffic on the network has exploded. Facing the information overload caused by the ever-increasing and complex commodity, news, and video data in the network, users will spend more time and energy searching and filtering information in order to find effective information that meets their own needs. To ensure and improve the user experience, and to show the information that users are interested in or need accurately and quickly to users, so as to improve user stickiness and bring business benefits, has become the primary challenge of relevant enterprises and a major research hotspot at present. Recommender system is one of the most successful applications of data mining and machine learning technology in practice. As an effective technical means, it can help users find items related to their interests in a large set of objects in a personalized way [1]. Nowadays, with the continuous growth and enrichment of information in the network, such systems have been used in various application fields, including e-commerce or streaming media websites, and receiving different forms of automatic recommendation results has become a part of human daily online experience.
Tag system is a three-dimensional system composed of users, tags and resources. Tags are the link connecting users and resources. Aggregation analysis of the whole tag system can make tags with similar semantics form tag clusters, thus improving the personalized recommendation process As shown in Figure 1, user profile and tag clustering extracted from the tag system are first compared to generate personalized recommendations. In Figure 1, R represents a resource, T represents a tag, RN and TN appear in pairs, representing a user’s annotation of a resource.
The tag clustering in the tag system is very similar to the community structure in the social network, as shown in Figure 2. The nodes in Figure 2 can be regarded as members of a typical social network or labels in a label system; Edges in Figure 2 can indicate that two members of a social network are closely related to each other, or that two tags of a tag system are used to describe the same resource Tag clustering is a cluster formed by tags with close semantic links, which is very similar to the concept of community in social networks The community discovery algorithm in social networks has been mature and successfully applied in the research of epidemiology, metabolic system and ecosystem.
Recommender systems learn patterns in data by analyzing the historical behavior of individual users or entire groups of users. Most of the current online networks can record various types of user behaviors, including users viewing products or making purchases, etc., and there may be internal connections between multiple behaviors of a single user. The recommender system uses these recorded behaviors and learned patterns to calculate recommendation results that match the preferences of individual users. Compared with search engines, recommender systems can play a better role in fields that interact closely with users and when users have no specific needs and keywords that cannot accurately express the required information. The session recommendation studied in this paper is to recommend the items or content that the user is most likely to be interested in next according to the click behavior sequence of the anonymous user in the current session. Many e-commerce platforms (especially those small retailers) and most news and media sites typically do not track and record the historical behavior of users who visit their sites for long periods of time. While cookies and browser fingerprinting can provide some degree of user identifiability, these technologies are often unreliable and raise privacy concerns.
Therefore, the problem commonly faced by today’s real-life recommender systems is that the recommendation must be based only on the short-term session data of the current user rather than the long-term history. In this case, classic and widely used recommendation algorithms such as matrix factorization, collaborative filtering, etc., may not work effectively in conversational recommendation scenarios due to the lack of user explicit feedback (user rating data).
There are three main contributions of this paper:
We propose a deep learning-based short-term memory-first session recommendation algorithm.
We embed spatial attention and channel attention into the network to enhance the feature representation.
Experiments show that our method far exceeds the baseline and can generalize to a variety of scenarios.
By sorting out and summarizing the research ideas of related work modeling and learning user interests, this section divides traditional machine learning methods in the field of conversational recommendation systems into two categories according to the level of user interest concerned:
A global model that focuses on identifying users’ long-term interests [2];
Local models with emphasis on short-term interests that change over time [3].
Based on the idea of collaborative filtering, the global model captures users’ long-term interests by using historical data of user interactions (purchase/click) in all sessions. The idea of collaborative filtering is one of the earliest modeling ideas in the field of recommender systems, and typical representatives include latent factor models and K nearest neighbors [4, 5, 6]. The K-nearest neighbor model in the session recommendation task tries to find the K sessions that are most similar to the current session from the overall session data, and then uses the session similarity as the score of each candidate item to indicate its interest in the current session. The correlation of, that is, the prediction result of the user’s next click is generated according to the items that appear in similar sessions. The calculation formula is shown in Eq. [1]: \begin{equation} \label{e1}\tag{1} \text{score}(\hat{v})=\sum _{s_{nb} \in N(c)}\text{sim} \left(c,s_{nb} \right)\cdot 1_{s_{nb} } (\hat{v}). \end{equation}
Among them, \(c\) represents the current user session, \(s_{nb}\) represents a similar session, \(1_{s_{nb} } (\hat{v})\) represents whether the candidate item appears in a similar session, 1 if it occurs, and 0 if it does not appear. Local models focus on short-term interests in the recent interaction behavior of users in modeling sessions, and Markov Chain (MC), as a direct way to simulate continuous sequence data, was widely used in the early days of session recommendation research [7]. However, since the recommendation system is often faced with a huge number of candidate products and historical data, the state transition probability calculated by the Markov chain model is difficult to work effectively, and the Markov chain model only predicts the next record based on the last record in the sequence [8]. An ideal session recommendation system should take into account both the short-term interest in the session that changes with the user’s behavior and the long-term interest in the overall information of the session when generating recommendations, because the user’s next click behavior may be affected by these factors. Therefore, some researchers try to exploit both long-term interest and short-term interest information by fusing the features of global and local models. Zhang et al. [9] proposed a Markov Chain Mixed Model FPMC named Personalized Decomposition, which combines the advantages of both Markov Chain and Matrix Factorization methods to model short-term short-term behavior by tensor representation of users’ sequential behaviors. behavior characteristics, and decompose the user’s long-term interest representation, thus achieving better recommendation effect than a single global model or a local model. Mu [10] proposed a hybrid model based on representation learning, which adopts a hierarchical structure to separately model the sequential behavior of users in the current session and long-term interests in the full history.
In recent years, the ability of deep neural networks to effectively extract and utilize abstract features in data has been demonstrated in related research results, and has attracted the attention of many researchers [11]. At the same time, with the proposal and promotion of back-propagation algorithm, cyclic neural network, which has advantages in modeling continuous dependencies, has become the preferred method for processing serialized data. Cyclic neural network is widely used in machine translation, intelligent question answering and other fields to model sequences [12]. The latest research progress in deep learning in the field of natural language processing [13] has also inspired relevant researchers in the field of conversation recommendation, and a number of conversation recommendation algorithms based on deep neural networks have emerged, some of which represent more advanced technologies and cutting-edge development directions in the field of conversation recommendation research. The mainstream way for foreign researchers to model conversational contextual information. Based on the effectiveness of the GRU4Rec algorithm, many researchers continue their approach to modeling sequence context information and propose a series of variants [14, 15]. Zhou et al. [16] proposed a data augmentation technique and training set partitioning method to speed up the training process and alleviate model overfitting to improve the overall performance of deep neural network based conversational recommendation algorithms. Based on the idea of GRU4Rec, researcher in [17, 18] proposed a multi-layer recurrent neural network model [19], which separately considered the user’s dependencies and interest changes between different sessions to provide more reliable recommendation results [20, 21].
We propose the first baseline algorithm, called Short-Term Memory Only Model (STMO), whose main purposes include: First, by comparing this model with related work in experiments, In the conversational recommendation task, different from the general use of sequence overall information, only considering the user’s recent behavior information for recommendation performance, which reflects the rationality of one of the modeling ideas in this chapter; second, the STMO model is proposed in this chapter. The short-term attention and memory-first conversational recommendation model, the simplest and most intuitive version of the short-term memory-first idea, can be used to compare and evaluate the performance improvement of the final model proposed in this paper; A baseline model STMP is compared to demonstrate the rationality of using both long-term memory and short-term memory proposed in the second model and the effectiveness of the proposed short-term memory-first information fusion approach. The model architecture of STMO is shown in Figure 3.
It can be seen from the model architecture diagram of STMO that the model takes the last click \(x_{t-1}\) of the given session subsequence \(x_{t}\) as the model input, and directly calculates with the candidate item after simple feature abstraction and extraction to predict the probability that the candidate item is the user’s real next click item. It is mainly composed of a multi-layer perceptron and a softmax classifier. First, with the distributed vector expression \(x_{t-1}\) corresponding to the item \(x_{t}\) clicked by the user for the last time in a given session as input, a simple muti layer perceptron (MLP) without hidden layer is used to characterize the short-term memory represented by the user’s last click \(x_{t}\), and the output vector \(h_{t}\) is obtained, as shown in \ref{e2}: \begin{equation} \label{e2}\tag{2} \ h_{t} =f\left(W_{t} x_{t} +b_{t} \right). \end{equation}
For each candidate Article \(x_{i}\) in the article dictionary, the inner product method and \(h_{t}\) are used to calculate the recommendation score, and the probability distribution of all candidate articles is obtained through the softmax classifier. The specific calculation process is shown in \ref{e3}: \[\label{e3}\tag{3} y_{i} =\text{softmax}\left(h_{t}^{T} x_{i} \right).\]
The structure of the STMO model is very simple. In the task of conversational recommendation in real scenarios, the model has the following obvious defects. First of all, in terms of modeling mechanism, the STMO model is actually a next click prediction algorithm based on a single click. This design affects the model’s generalization ability on longer-length sessions, because in shorter sessions In a longer session, the user’s interest is less likely to change, and the last click can represent the user’s current interest, while in a longer session, the user’s behavior is more complex, so it is necessary to consider both the overall interest and Only the current interests in recent behaviors can produce more comprehensive and accurate recommendation results that satisfy users. It alleviates the information loss problem that the STMO model does not consider the session history. The structure of the STMP model is shown in Figure 4. As can be seen from the figure, compared with the STMO model, the biggest difference in the STMP model in terms of modeling ideas is that when calculating the candidate item score, the overall information of the session and the last click feature are taken into account, in order to alleviate the STMO model and traditional The Markov chain-based model only calculates the features of the last clicked item when making recommendations, ignoring that although the clicked item in the session context may contain features unrelated to the user’s current interests, it still contains some features that can be more useful. Accurately model important information of user interests.
For example, it is known that the last click behavior in a user’s session is about a certain model of mobile phone, and the user’s preferences in terms of mobile phone brand or price can be clarified through the characteristics of items clicked before in the session. In this way, the interest range of the users under its jurisdiction produces more accurate recommendation results. Therefore, the STMP model uses two feature vectors (\(m_{s}\) and \(m_{t}\) ) as the input of the trilinear feature fusion layer, where \(m_{s}\) represents the feature representation of the current session, which is considered in this chapter to contain the user’s overall interest information, calculated by the external memory of the current session. The average income, the specific definition is shown in Eq. \ref{e4}: \begin{equation} \label{e4}\tag{4} m_{s} =\frac{1}{t} \sum _{i=1}^{t}x_{i} \end{equation}
The so-called “external memory” refers to the vector sequence composed of the item vector representations in the current session subsequence \(x_{i}\). In this section, the average value of all historical click item vectors in the session is used as the session representation, so as to preserve the sequence information of the session itself. The effectiveness of this computationally simple way of representing sequence information has been proven in natural language processing and other related fields. At the same time, as can be seen from Figure 3, since \(x_{t}\) is derived from the external memory of the session, it is called the short-term memory (current interest) representing the user’s current preference in this paper.
In the Short-Term Attention Memory Priority (STAMP) model, the next click of the user in the session is predicted, where the weighted user interest is represented by a bilinear combination of long-term memory (the vector mean of the historically clicked items in the session) and short-term memory (the vector of the last clicked item). The importance of the contribution is the same, resulting in that, first, item information that was clicked many times in the session will occupy a larger proportion in the overall interest representation in the STMP model, and second, the overall interest representation in the STMP model will not It changes because the order in which the items are clicked changes. This paper believes that there are some problems in this situation. First, for the first point, although the items that appear many times in the conversation occupy a large proportion in the overall interest, which can reflect some of the user’s preference information, this method only considers the Click frequency, but ignore the relevance of each historical click and the user’s next click in the session context, which is more important to the session recommendation task, is different. Maybe only a part of the click behavior is highly related to the user’s next click, so relatively in other words, the overall user interest representation based on the sum-average method is relatively rough and inflexible.
In addition, for the second point, since the interests of users in a session may be diverse, and the same item is clicked multiple times in a session, the contribution of this click behavior to modeling user interests will also occur depending on the location of the occurrence. Changes, especially in long sessions, are more likely to occur, because intuitively, when the session lasts for a long time, the user’s interest is likely to have changed from the beginning, and the items that the user has recently clicked in the session are related to Information may be more reflective of the user’s current interests, see Figure 5.
This section uses two datasets from real websites, Yoochoose1 and Diginetica2, to conduct experiments on the proposed model. The former is a public dataset released in the RecSys Challenge 2015 competition. The data source is the e-commerce website Yoochoose.com. Months of historical click sequences (commodity browsing records), this section uses the part of the dataset that only contains session records as training and testing data. The latter Diginetica dataset is derived from another competition CIKM Cup 2016, and only transaction session data in this dataset is used in this study.
The stamp model proposed in this chapter mainly includes the following hyperparameters: vector expression dimension \(d\), learning rate \(\eta\) and learning rate attenuation \(\lambda\). Randomly divide 20% of the data in the training set as the verification set, and all the hyperparameters are mainly optimized by extensive mesh optimization on the verification set, and according to the Rcall@20 To obtain the optimal model and determine the superparameter setting. The hyperparameter range of grid optimization is: the value range of vector dimension \(d\) is \(\mathrm{\{}\)50100, 200300\(\mathrm{\}}\), the value range of learning rate \(\eta\) is \(\mathrm{\{}\)0.001,0.005,0.01,0.1,1\(\mathrm{\}}\), and the value range of learning rate attenuation \(\lambda\) is \(\mathrm{\{}\)0.75,0.8,0.85,0.9,0.95,1.0\(\mathrm{\}}\). According to the comprehensive performance, in this research, the following super parameter combinations are used in three data sets for experiments: \(\mathrm{\{}\)100,0.005,1.0\(\mathrm{\}}\) \(d,\eta ,\lambda\). During training, the sample size of each batch is 512, and the Adam optimizer is used for 50 rounds of iterative training. In addition to the hyperparameters, the parameters of the neural network weight matrix of the model obey the normal distribution \(\eta ^{2}\) (0,0.05) for initialization, and all bias vectors are initialized with zero vectors. The expression vectors of all items are randomly initialized by the normal distribution \(\eta ^{2}\) (0,0.002), and are updated with the model training iteration together with other parameters of the model.
Explicitly prioritizing the item information of the user’s last click in the session when predicting the user’s next click can strengthen the model to prioritize the user’s current interests when making recommendations. This subsection designs a set of comparative experiments based on STAMP and STMP models on three datasets to further analyze the proposed role of the model in prioritizing the last click in a session as short-term memory. The following is a brief introduction to the model information involved in the comparison: STMP (without_lastclick): that is, the STMP model that does not use the last click item information for calculation in the trilinear combination calculation layer. STMP: That is, the STMP model that adds the final click item information for trilinear calculation. STAMP(without_lastclick): A STAMP model that does not use the last click item information to calculate in the trilinear combination calculation layer. STAMP: The STAMP model that adds the last clicked item information for trilinear calculation. Furthermore, the proposed method in this chapter captures both long-term and short-term interests, considering that users’ interests may change due to longer browsing during a session, and that the user’s next click is more likely to be related to the last click that represents the current interest. And a model that augments last-click information is thought to be potentially very beneficial in handling long sessions. In order to further verify the effect of prioritizing the last clicked item in the session, this section analyzes the change of the Recall@20 indicator of the proposed model when faced with sessions of different lengths in the Yoochoose 1/64 dataset. The result is the line graph in Figure 6.
The graph in Figure 6 shows the variation in prediction accuracy with increasing session length for three different models. It is observed from Figure 6(a) that the performance of the STMP and STAMP models proposed in this chapter weakens with the increase of the session length, but is still significantly higher than that of the NARM model. Interests are more important than just considering the main interests of users in a session. At the same time, from the performance changes of STMP and STAMP models and variant models (STMP and STAMP- in the figure) with the session length shown in Figure 6(b), it can be found that the model with short-term memory priority idea has better performance overall. The reason for the analysis may be that in a long session, the user’s current interest is more likely to focus on the last click and its vicinity, so STMAP and STMP can more easily provide user-satisfied recommendation results. At the same time, the longer the session, the larger the accuracy gap between STMP (without_lastclick) and STMP and between STAMP (without_lastclick) and STAMP. This further proves that prioritizing short-term memory is more in line with the general behavioral characteristics of users.Furthermore, STAMP(without_lastclick) also outperforms STMP(without_lastclick), which is due to the fact that the attention mechanism in STAMP(without_lastclick) captures mixed overall interest and current interest, while STMP(without_lastclick) only averages the information in a way Consider the overall information of the session; lack of enhanced utilization of the relatively important part of the click information in the session.
In order to highlight the information related to the user’s overall (long-term) interest preference and current interest preference when modeling the expression of the session context, so as to alleviate the interest shift phenomenon caused by the change of user interests in the session, which is critical to the accuracy of the session recommendation algorithm. It can be observed that the performances of General_Attention and Last_Attention have their own advantages and certain complementarities on different datasets, but they are generally comparable. This is due to the difference in the focus of the two when focusing on information, resulting in the lack of user interest captured from the context of the conversation, although effective. General_Attention pays attention to the overall information of the session (user’s long-term behavior), and can extract items that the user is more concerned about under normal circumstances; Last_Attention pays attention to the short-term memory (user’s recent behavior) in the session to accurately discover the user’s current preferences. The former introduces noise because the user’s next behavior in the session may only be related to some clicks in the session, while the latter is easily affected by the user’s wrong click behavior and interest shift, see Figure 7.
Mixed_Attention considers the user’s long-term interest and short-term interest information in the session as comprehensively as possible when modeling the contextual representation of the session, so it achieves the best performance.
This paper analyzes the basic challenges faced by the recommendation task and the problems of the current mainstream session recommendation algorithm based on recurrent neural networks in modeling user interests. We put forward the core idea of the algorithm design: the long-term and current interests of users in the session are important for predicting the next click of the session, and the importance of different click information for session recommendation is not the same. Combined with the characteristics of human behavior, a short-term attention and memory first session recommendation algorithm stamp model is proposed. Experiments show that our algorithm based on recommendation only on the last click still can achieve the same or even better performance as the cyclic neural network model using a complex computing structure.
No funding is available for this research.
The authors declare no conflict of interests.
1970-2025 CP (Manitoba, Canada) unless otherwise stated.