Aiming at the complexity of mental health assessment for students in colleges and universities, this paper proposes an innovative framework that integrates social sentiment analysis and multi-branch neural networks. A multilevel mental health assessment system is constructed through cross-modal feature interaction CNN+BiGRU with heterogeneous graph structure modeling. In the model design, image feature extraction is pre-trained by five-branch CNN structure ViT, text features are fused by dynamic word embedding with multi-scale convolution, and a virtual node and metapath-driven heterogeneous graph neural network H-GNN is introduced to strengthen the global relationship modeling. Experiments show that the model achieves 89.7% and 91.2% accuracy on Twitter-15 and Twitter-17 datasets, respectively, and the F1 values are improved by 3.24% and 2.32% from the optimal baseline BICCM. In the actual college mental health monitoring, the model successfully captured the time-series fluctuations of depression index and anxiety level, and found that the rational-perceptual dimension was highly correlated with the examination cycle, with 0.69 during the midterm examination and 0.68 during the final examination. Through the ten-fold cross-validation comparison experiments, the model significantly outperforms the cutting-edge models, such as MIMNBERT, EF-NET and so on on the weighted average index, with an average accuracy rate of 99.02% and F1 value of 98.08%. The study shows that the framework provides a highly accurate and interpretable technical solution for mental health risk early warning, which is especially suitable for dynamic monitoring scenarios in universities.