Bayesian network-based multimodal large model optimization of speech text and its fault prediction capability in power industry

Abstract

Speech-text multimodal large model as a key tool in the operation of the power industry, its fault prediction performance directly affects the operational safety of mechanical equipment, this paper designs a detailed scheme for the optimization of its performance. Firstly, the structural design of the unimodal model is discussed, and the audio classifier based on Wav2Vec2 and the text classifier based on BERT are used to pre-train the model. Based on the above foundation, a multimodal model is introduced, with the cross-attention mechanism as the fusion strategy, so that the different modal information in the deep neural network is fused with each other, thus improving the accuracy and robustness of the recognition task. After completing the fault feature extraction task, on the premise of introducing the relevant theory of BNN, the structure of BBN is optimized, and after fusing the HC algorithm, BIC and annealing idea, the fault diagnosis method based on the improved BBN network is constructed by combining the fault feature extraction method in the electric power industry and the optimized BBN method. The effectiveness of the method is verified through simulation experiments. The prediction accuracy of this paper’s method for nine categories of fault data is above 90% at a high level, and the prediction accuracy of faults in some categories can reach 100%. The multimodal model fusion strategy proposed in this paper significantly improves the performance of fault feature recognition, in addition, the fault diagnosis method based on the improved BBN reduces the computational volume of the model and improves the fault prediction ability of the model.

Keywords: Multimodal fusion; Cross-attention mechanism; BBN structure optimization; Fault prediction