Rapid detection of bursts and leaks in water distribution system (WDS) can reduce the social social and economic costs incurred through direct loss of water into the ground, additional energy demand for water supply, and service interruptions. In recent years, water utilities have been monitoring the flow, pressure, water level, and water quality of WDSs in real time with the introduction of supervisory control and data acquisition (SCADA) systems (Farley et al., 2010). These real-time monitoring and control systems have also been applied to the inlets of district meter areas (DMAs) that are hydraulically isolated, and such fitting with flow and pressure instruments is one solution to the leak detection problem. Furthermore, smart metering systems with advanced metering, data logging, and wireless communication technologies are currently being introduced to DMAs for the purpose of providing customer demand information (Giurco et al., 2010).
Real-time monitoring alone is not enough to detect the breakage and leakage events, so all the information should be analyzed with appropriate algorithms. Many studies have already been performed on the use of mathematical or numerical models for constructing break and leak detection systems in distribution networks. These models include inverse transient analysis (Covas et al., 2005), state estimation (Anderson and Powell, 2000), stochastic and probabilistic models (Puust et al., 2006), artificial neural networks (Mounce and Machell, 2006), and the Kalman filter (Choi, 2016). Nonetheless, no consideration has been given to how to distinguish false and missing data from pipe bursts and leaks in water transmission lines and/or distribution systems.
While establishing burst detection system in the transmission lines of Seoul metropolitan area, it is found that cumbersome false and missing data can hinder real-time detection of leakage in spite of having large amount of monitoring data. The first and second phase of transmission lines of Seoul metropolitan area have 16 flowmeter, 24 pressure gages, 5 water level gages, 10 automatic control valve openings, and 18 pump on/off data as shown in Figure 1. In analyzing real-time data, four types of false and missing data were observed in time series, which divided into missing data, extraordinary large data, extraordinary small data, and stationary data. These types of faults make it difficult to analyze historic time-series data with data-driven models such as stochastic and probabilistic, artificial neural network, etc.