This research presents an innovative machine learning framework for predicting library space utilization patterns through the integration of multi-modal deep learning architectures and ensemble methodologies. The proposed system combines Long Short-Term Memory (LSTM) networks with attention mechanisms and sophisticated feature engineering techniques to achieve superior prediction accuracy while maintaining computational efficiency. The methodology encompasses three primary contributions: (1) development of a comprehensive feature extraction pipeline incorporating spatial, temporal, and environmental data streams; (2) implementation of a novel LSTM-Attention hybrid architecture with adaptive learning rate optimization; and (3) integration of ensemble learning techniques for robust prediction performance. The framework demonstrates significant improvements over existing approaches, achieving 96.8% prediction accuracy across diverse operational scenarios. Experimental validation, conducted using an extensive dataset comprising 2.1M samples collected over 33 months from multiple library facilities, demonstrates the framework’s effectiveness. The proposed model achieves a Mean Absolute Error (MAE) of 0.142 and Root Mean Square Error (RMSE) of 0.186, representing a 39.8% reduction in prediction error compared to baseline approaches. The system’s computational efficiency is evidenced by an average processing time of 45.3ms per prediction, with a memory footprint of 512MB. The research contributes to the field of intelligent library management systems by establishing a theoretically grounded and practically implementable solution for space utilization prediction. The framework’s superior performance in capturing complex spatial-temporal patterns, combined with its computational efficiency, makes it suitable for real-time applications in resource-constrained environments. These advances provide a foundation for enhanced space management strategies in modern library systems.