基于重构变分自编码的缺失数据插补方法

杨昊东; 任少君; 朱保宇; 李清扬; 司风琪

doi:10.19805/j.cnki.jcspe.2026.240765

基于重构变分自编码的缺失数据插补方法

Missing Data Imputation Method of Reconstructed-based Variational Autoencoder

摘要

摘要: 准确的测量数据是构建高精度数据驱动模型的重要基础。然而,电站实际运行数据受到测量、通讯、存储等过程的影响,不可避免地存在部分数据缺失的问题,影响后续基于运行数据的模型的有效性。为此,提出了一种基于重构变分自编码(RB-VAE)的缺失数据插补方法,该方法基于变分自编码模型构建重构指标,并通过Steffensen方法实现单个或多个缺失值的加速迭代更新,在保持插补精度的同时保证了计算效率,实现了对单个或多个缺失数据的高效准确插补。采用数学算例和工程实例进行分析,在缺失参数个数为1、2、3、6时,RB-VAE插补结果的决定系数(R²)均值分别为0.998 0、0.998 6、0.999 2、0.996 2,均能与真实值保持较小的偏差,插补效果明显优于missForest和生成对抗插值网络(GAIN)方法。

Abstract: Accurately measuring data are essential for constructing high-precision data-driven models. However, the actual operational data of a power station are subject to a number of influences, such as measurement, communication, storage and other processes, inevitably resulting in partial data loss, which can subsequently impact the effectiveness of modeling based on the operational data. To address this issue, a missing data imputation method based on reconstructed variational autoencoder was proposed. This method constructed reconstruction indexes based on the variational self-coding model and utilized the Steffensen method to accelerate the iterative updating of single value or multiple missing values. This approach ensures computational efficiency while maintaining interpolation accuracy, allowing for the efficient and accurate interpolation of single missing datum or multiple missing data. Mathematical calculations and engineering examples were employed to analyze the results. When the number of missing parameters is 1, 2, 3 and 6, the mean coefficients of determination (R²) of the reconstruction-based variational autoencoder interpolation results are 0.998 0, 0.998 6, 0.999 2 and 0.996 2, respectively. These values demonstrate minimal deviation from the actual values, and the interpolation effect is evidently superior to that of the missForest and generative adversarial imputation net (GAIN) methods.

HTML全文

参考文献(25)

施引文献

资源附件(0)