Gradient Boundary Detection for Time Series Snapshot Construction in Sensor Networks
J. Lian, Member, IEEE, L. Chen, Member, IEEE, K. Naik, Member, IEEE,
Y. Liu, Senior Member, IEEE, and G. B. Agnew, Member, IEEE
Abstract—In many applications of sensor networks, the sink needs to keep track of the history of sensed data of a monitored region for scientific analysis or supporting historical queries. We call these historical data as a time series of value distributions, or
snapshots. Obviously, to build the time series snapshots by requiring all the sensors to transmit their data to the sink periodically is not energy-efficient. In this paper, we introduce the idea of gradient boundary, and propose a gradient boundary detection (GBD) algorithm to construct these time series snapshots of a monitored region. In GBD, a monitored region is partitioned into a set of sub-regions and all sensed data in one sub-region are within a predefined value range, namely gradient interval. Sensors located on the boundaries of the sub-regions are required to transmit the data to the sink, and then the sink recovers all sub-regions to construct snapshots of the monitored area. In this process, only the boundary sensors transmit their data, and therefore, energy consumption is greatly reduced. The simulation results show that GBD is able to build snapshots with a comparable accuracy and has up to 40% of energy saving compared with the existing approaches for large gradient intervals. Index Terms—Wireless sensor networks, query processing, planar graph.
—————————— ——————————
1 INTRODUCTION
UE to recent advances in computing and wireless communication technologies, wireless sensors are available to measure real world phenomena. A large number of sensors are densely deployed in a specific region to form a wireless sensor network. These sensor networks are designed for monitoring environment and supporting user queries. In many real applications [1], [2], [3], [4], users are interested in analyzing the changes of the value distri-butions, called snapshots, of the monitored geographic re-gion over time. Fig. 1 shows an example of a snapshot. From temporal point of view, queries over a sensor net-work can be categorized into real-time query and historical query. The response of a real-time query is dependent solely on the current data of the monitored region, and the re-sponses are required to be real time. One example of real-time query is “find the location with the highest temperature in the monitored area now.” Many existing approaches are de-signed to support such real time queries [2], [3], [7], espe-cially for target tracing [8], [9], [12], [17], [18]. On the other hand, the response of a historical query relies on historical data. Historical queries are useful in many environment monitoring applications, such as monitoring Yosemite Na-tional Park [4]. In the Yosemite National Park project, a
————————————————
D
• J. Lian, K. Naik, and G. B. Agnew are with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Can-ada, N2L 3G1. E-mail: jlian@swen.uwaterloo.ca, knaik@swen.uwaterloo.ca,and gbagnew@engmail.uwaterloo.ca.
• L. Chen and Y. Liu are with the Department of Computer Science, Hong Kong University of Science and Technology, Kowloon, Hong Kong. E-mail: leichen@cs.ust.hk, and liu@cs.ust.hk.
Manuscript received 22, December 2005; revised 24, May 2006; accepted 10, No-vember 2006. Recommended for acceptance by I. Stojmenovic. For information on obtaining reprints of this article, please send e-mail to: tpds@computer.org, and reference IEEECS Log Number TPDS-0519-1205.
typical historical query is “find the average temperature in a certain region for each year from 1992 to 1999.” To support this type of queries, the sink maintains time series snapshots with a tolerable error for a monitored region. Usually, sen-sors are battery-powered with a limited energy resource, and batteries are nonreplicable in many situations [2], [3], [12]. Therefore, energy efficiency is an important goal for constructing time series snapshots. Obviously, periodically transmitting data from all sensors to the sink is not energy-efficient. Although many query processing approaches proposed in the literature ([7], [9], [11], [12], [13], [14], [15], [16]) can be extended to build time series snapshots, as their primary concerns are on processing real-time queries, they can not address historical queries well.
In this paper, we propose a Gradient Boundary Detection (GBD) algorithm to build time series snapshots for a moni-tored region. The GBD algorithm depends on the concept of gradient boundary as follows. In real applications, e.g., tem-perature monitoring, there exists high similarity of the tem-perature values in the vicinity as demonstrated in Fig 1.
Fig. 1 Temperature value distribution of Permafrost in Canada. Even though the temperature value distributed in Fig 1 is scaled over Can-ada, it still reflects the realistic situation for smaller areas.
xxxx-xxxx/0x/$xx.00 © 2007 IEEE
2 IEEE TRANSACTIONS ON PARALELL AND DISTRIBUTED SYSTEMS, TPDS-0519-1205
In Fig. 1, the monitored region is partitioned into a set of sub-regions and the value distribution in each sub-region is limited by a predefined range of temperature. To find a snapshot at a given time for the region, a solution is to ob-tain the readings from sensors located close to the bounda-ries of all sub-regions. In this way, sensors that are not lo-cated at the vicinity of the boundaries do not transmit their data to the sink, resulting in reduced energy consumption. Based on this idea, we propose gradient boundary detec-tion (GBD) algorithm to support snapshot construction. The major contributions of this work are as follows:
• We introduce the concept of virtual surrounding face
Madden et al. [3] propose the Tiny Aggregation Service (TAG) to reduce the energy cost of processing aggregative real-time queries (e.g., Sum, Average, Max, Min, etc.,) by using in-network aggregation. Some other work based on TAG support aggregative queries, such as Cougar [13] and TinA [14]. Cougar not only processes real-time queries by partial data aggregation, but also reduces packet payload by merging data packets. TinA utilizes the temporal coher-ence among data collected from the same sensor to reduce the number of transmissions. However, these approaches have two major drawbacks when they are extended to sup-port historical queries. First, these approaches are proposed (VSF), which is used to approximate the boundaries of partitioned sub-regions.
• We propose GBD algorithm to construct snapshots
with low transmission cost.
• We compare performance of the proposed algorithm
with the existing approaches and show that GBD is su-perior to the existing approaches.
The rest of the paper is organized as follows. We review the related work in Section 2 and identify the problems in the existing approaches. A few terms and the idea of the work are given in Section 3. We present the GBD algorithm in Section 4. The simulation methodology is introduced in Section 5, and the performance of GBD is evaluated in Sec-tion 6. Finally, we conclude this work in Section 7.
2 RELATED WORK
A number of research results have been published on col-lecting data from sensor networks for energy-efficient query processing [2], [3], [9], [11], [16]. However, most of them focus on supporting real-time queries, and only a few of them aim to build time series snapshots for historical queries. In addition, location detection techniques play an important role for query processing in sensor networks. In this section, we review those approaches as follows.
Directed Diffusion (DD) [2] is a well-known approach to support event-based real-time queries. In such a query, a sensor reports data only if it detects occurrences of some specified events. Extending DD to support historical que-ries leads to all sensors joining the data collection process, and it is, therefore, not suitable for building time series snapshots. VigilNet [12] implements an integrated system with a focus on tracking mobile targets. Similar to DD, VigilNet is designed to handle event-based real-time que-ries, and is not suitable for historical queries. LEACH [16] is a scalable adaptive clustering protocol in which nodes in the vicinity are organized into clusters. Sensors in a cluster communicate with the cluster header (CH), which directly transmits the processed data to the sink. Even though LEACH can be extended for collecting snapshots of moni-tored region, it is not energy-efficient due to the long dis-tance transmissions from sensors to CHs and from CHs to the sink. In general, energy consumption for transmission is a quadratic or hyper-quadratic function of a transmission distance [27]. The long distance transmissions involved in LEACH consume a much larger amount of energy than normal multi-hop transmissions in sensor networks [27].
to handle aggregative queries such that data are aggregated during the collection phase. If we use these aggregated data to build the snapshots at the sink, it will result in a large mean error compared with the actual value distribution of the monitored region. Second, in the data collection phase, these approaches have no knowledge about the incoming historical queries; thus, it is difficult to decide which aggre-gation operators to use. For example, if the operators Aver-age, Max, and Min are chosen to collect data, it can not an-swer an incoming historical query: find all sub-regions in which the temperature value is between 20C° and 25C°. To overcome these two drawbacks, these approaches have to transmit all the sensed data back to the sink without any aggregation, resulting in a high transmission cost.
Recently, Kotidis proposed a Data Cluster (DC) ap-proach [11] with a focus on supporting aggregative queries. In DC, a set of nodes is voted as representative nodes based on the value distribution of their neighbor nodes. Then the representative nodes answer queries on behalf of other nodes. The DC approach can be easily extended to con-struct snapshots of monitored regions, but with the disad-vantage of high cluster construction and maintenance cost. To obtain time series snapshots of a monitored region, all sensors participating in a query process should know their geographic locations. For outdoor applications, the existing location detection techniques can be classified into two types: GPS-based and GPS-free techniques. The former uses the Global Positioning System (GPS) ([21], [22]) and can determine locations within a few meters. On the other hand, in GPS-free techniques [23], [24], [25], [26], a small number of master nodes are selected which precisely know their locations. To determine the locations of other nodes, each node with an unknown location broadcasts a message to its neighbors. The master nodes measure the received signal strength from the message initiator, and compute the location based on the measurements. In either way, sensor nodes know their location with high accuracy.
3 OVERVIEW OF GBD ALGORITHM
In this section, we demonstrate the environment model, introduce the basic idea of the GBD algorithm, define re-lated terminology, and present the idea of Virtual Sur-rounding Face (VSF) which is a major part of GBD.
3.1 GBD Algorithm
We consider the snapshot construction problem to support
LIAN ET AL.: GRADIENT BOUNDARY DETECTION FOR TIME SERIES SNAPSHOT CONSTRUCTION IN SENSOR NETWORKS 3
historical queries in environment monitoring. Fig. 2 shows an example of the value distribution of a monitored region. This environment model is realistic for many applications, such as temperature, humidity, and pressure monitoring [3], [4]. In Fig 2(a), the monitored region is divided into three sub-regions R1, R2, and R3. The temperature values of all points in R1, R2, and R3 are in the ranges 22–24C°, 20–22C°, and 18–20C°, respectively. The dotted (dashed) curves denote the boundaries between two adjacent sub-regions. To obtain time series snapshots, sensors are evenly de-ployed in the region (Fig. 2(b)). The sensors sense the envi-ronment periodically, and send data to the sink. The sink rest of this section, we define terminology and describe how to use the idea of the virtual surrounding face to ap-proximate the boundaries of sub-regions.
3.2 Environmental Model
Before we discuss the GBD algorithm, we state the envi-ronment model. In the rest of the paper, we use tempera-ture monitoring as an example to state the GBD algorithm. Grid representation of snapshot: For a realistic moni-tored region, the snapshot can be represented by a tempera-ture value distribution over a two-denominational real plane in which each point is associated with a temperature uses the received data to recover the snapshot of the region.
(a) C °
R218-20
20-22
R22-24 1
R3
(b)
R2v Sink
RChain 1
u
Sensor R3
Fig. 2 (a) Value distribution of the monitored region, and (b) densely deployed sensors in the region.
Obviously, letting all sensors transmit their data to the sink is not energy-efficient. This is because values distrib-uted in the vicinity of any location in the monitored region are highly correlated. One sensed value for a location can be used to recover the value distribution in the vicinity of the location. One energy-efficient solution is to ask sensors close to the boundaries of the sub-regions to transmit their data to the sink, and the sink uses these data to recover the boundaries. In this way, sensors far away from the bounda-ries do not need to send their data to the sink, thereby these sensors saving energy associated with these transmissions. The GBD algorithm realizes this goal, and an example of GBD is shown in Fig. 2(b). In Fig. 2(b), to detect the bound-ary of R1, sensors close to the boundary of R1 form a chain which is an approximation of the boundary. In GBD, a sen-sor, namely node u, is selected to send two messages trav-ersing the chain in two directions, clockwise and counter-clockwise. The traversing messages collect the sensed val-ues of the sensors along the chain, and the two messages reach a node v. Then, v forwards the messages to the sink. Finally, the sink can find a boundary of R1 by using the re-ceived data. Similar steps can be used to obtain other boundaries. In this way, GBD can obtain an approximated snapshot of a monitored region at any given time. In the
value. However, it is difficult to maintain values for all points in the real plane since there are an infinite number of points. Instead, we partition the monitored region into a grid structure with integer resolution, and maintain a tem-perature value for each cell in the grid structure. For sim-plicity, we construct snapshots for a square region which is represented by a w × w grid. Each cell in the grid is associ-ated to a temperature value. The value distribution in the grid representation is an approximation of the real value distribution in the real plane. The larger the value of w is, the higher the accuracy can be achieved by using the grid structure comparing with the value distribution in the real plane. An example of a grid structure is shown in Fig. 3(a). C ° R218-20 20-22
22-24
R1
R3 (a) Grid representation of three levels of gradients
C °
18-20
20-21
21-22
R122-23 R4R3R223-24 R5
(b) Partition with five levels of gradients
Fig. 3 Partitions for a monitored region with different gradient levels.
Gradient Boundary (GB) and Gradient Region: By us-ing the grid representation, each snapshot of a w × w grid can be stored in a two-dimensional array grid[w][w], where grid[i][j] maintains the temperature value at the cell in the ith row and jth column, where 1 ≤ i ≤ w and 1 ≤ j ≤ w. Each cell, except the ones on the edge of the grid, has four
4 IEEE TRANSACTIONS ON PARALELL AND DISTRIBUTED SYSTEMS, TPDS-0519-1205
As illustrated in Fig. 2, the GBD algorithm recovers the snapshot by traversing the chains close to the gradient boundaries. GBD uses face traversal to realize this goal. Some preliminary terms and a network model related to face traversal are discussed in this section.
Unit Disk Graph (UDG): UDGs are accepted models of sensor networks in which all nodes have an identical neighbors (up, down, left, and right). To capture the snap-shot of a monitored region at a given time, the temperature
distribution is sub-divided into m levels of gradients: GDm = {[Tk, Tk+1) | 1 ≤ k ≤ m}, where T1 < T2 < … < Tm+1 are tem-perature values, and [Tk, Tk+1) is a temperature range.
With a given GDm for a w × w grid, the kth gradient region Rk is defined as a sub-region containing all cells in the grid with their temperature values in the range [Tk, Tk+1). The boundary of Rk is defined as a set of cells which belong to Rk and each cell has at least one neighbor in a gradient re-gion other than Rk. The boundary of Rk is called a gradient boundary (GB), denoted by Bk. As shown in Fig. 3(a), the For such applications, the value distribution inside a gradi-ent region is not of interest, whereas BERR has a greater im-pact on the accuracy of the query response.
3.3 Face Traversal
monitored region is partitioned into three gradient regions R1 (dark grey region), R2 (light grey region), and R3 (the rest of the monitored region) by three levels of gradients GD3 = {(18, 20), (20, 22), (22, 24)}. A gradient region may not be enclosed by one connected boundary.
Performance metrics of snapshot: In this paper, we fo-cus on constructing snapshots and detecting the boundaries of gradient regions. To support historical queries, GBD di-vides the system time into time intervals, and the sink col-lects a snapshot during each interval. Historical queries are answered by using the snapshots. To obtain a more accu-rate snapshot of the monitored region, more levels of gradi-ents can be used. As shown in Fig 3(b), five levels of gradi-ents are used, resulting in five regions denoted by R1 to R5. Let gridT[w][w] denote the actual value distribution of a region with the grid representation, and let gridR[w][w] de-note the recovered value distribution (snapshot) by using a detection technique. Two application-dependent metrics are used to measure the accuracy of a snapshot: Mean Abso-lute Error (MERR) and Boundary Error (BERR).
For a given region with a w × w grid, MERR is defined as the mean value of differences between actual value distri-bution and the recovered value distribution. The formal definition is given in (1) as follows.
MERR=(∑wi=1∑wj=1
|gridT[i][j]−gridR[i][j]|)w2 (1) For a monitored region with a w × w grid representation and GDm = {[Tk, Tk+1) | 1 ≤ k ≤ m – 1}, we define ∂(x, y) as a testing function as follows: ∂(i,j)=
⎧⎨
1∃k:(gridT[i][j]∈[Tk,Tk+1))∧(gridR[i][j]∈[Tk,Tk+1)). ⎩0otherwise
Then, BERR is defined in (2) as follows. B2ERR=1−∑wi=1
∑wj=1
∂(i,j)w.
(2)
∂(x, y) is used to determine if the actual value and the re-covered value of a given cell belong to the same gradient region. Denominator w2 denotes the total cells (area) of a region in the grid representation. The numerator denotes the total number of cells which actual value and recovered value belong to the same gradient region. Hence, BERR de-notes the percentage of the areas of incorrectly recovered gradient regions over the total area.
The MERR is used by the mean-error sensitive queries, such as “get average temperature in a sub-area”. However, some applications disregard MERR, but desire to have a small BERR, such as a query in Fig. 1: “find the sub-regions in which temperature is in the range between 4.5 C° and 6.5C° ”.
transmission range rs [2], [3], [5], [6]. Let V denote a set of nodes in a sensor network. We define UDG(V) as a unit disk graph in which an edge euv between nodes u and v ex-ists iff the Euclidean distance between u and v is no longer than rs. For an edge euv ∈ UDG(V), u and v are called UDG neighbors.
Planar Graph and Gabriel Graph (GG): Face traversing plays an important role in GBD and can only be applied on a planar graph. A graph is called planar if the graph has no two edges crossing one another. To planarize a UDG(V), a deduced sub-graph of UDG(V), called Gabriel graph (GG), is employed. The Gabriel graph on UDG(V) is defined as a graph GG(V) so that for each edge euv ∈ UDG(V), euv ∈ GG(V) iff the circle with euv as a diameter does not contain any other nodes. For an edge euv ∈ GG(V), nodes u and v are called Gabriel neighbors. A localized algorithm to find GG(V) has been presented in [6]. For each node u, let NUDG(u) and NGG(u) denote the sets of UDG and Gabriel neighbors of u, respectively.
Faces in a Planar Graph: The edges in a planar graph partition the network region into a set of faces [5], [6]. A face in a planar graph is a continuous sub-area bounded by one or more close sequences of edges. For example in Fig. 4, the network region is partitioned into four faces, F1, F2
(dark grey region), F3 (light grey region), and F4, where F4 is
the unbounded region outside the border of a network
graph. Note that face F3 is bounded by two sequences of edges: u→u1→u2→…→u10→u11→u12→u11→u10→u13→y→z→u, and
v1 →v2 →v3 →v4 →v1. u 4Fu2u34u u51v1 uF3F2v2u6 vzv4v3 Fu121 wyuu711 xu13uu810
u9Fig. 4 Face partition and face boundary traversal.
Face Traversal: A face in a planar graph can be traversed by employing Right-Hand Rule (R-Rule) or Left-Hand Rule (L-Rule) [5], [6] or both. In R-Rule, a person explores a face by keeping her right hand on the walls (edges) and she will
LIAN ET AL.: GRADIENT BOUNDARY DETECTION FOR TIME SERIES SNAPSHOT CONSTRUCTION IN SENSOR NETWORKS 5
eventually visit all edges on the face. In L-Rule, a person explores a face by using her left hand. We use Fig. 4 as an example to illustrate face traversing. Starting from u, to traverse F1 by R-Rule, u sends a traversing message to v in the form of trav(source, [(dest, rule), …]), where the source is the message sender, dest is the recipient, rule is R-Rule or L-Rule, and […] means that there may be many (dest, rule) pairs. For node u, the message is trav(u, (v, Right)). When v receives the message, it sends trav(v, (w, Right)) to w. Re-peating the step by all nodes receiving the message, F1 is traversed counterclockwise. Similarly, u can use trav(u, (z, two faces that share some edges, if one of the shared edges is ignored, the two faces are merged into one face with a larger area. Consider the Gabriel graph shown in Fig. 5(a) in which face F1 is bounded by the edge sequence u→v→w→x→y→z →u, and face F2 is bounded by the edge sequence y→z→z1→ z2→z3→z4→y. If the crossing edge eyz, which is shared by F1 and F2, is ignored, F1 and F2 are merged into one face bounded by an edge sequence: u→v→w→x→y→z4→ z3→z2 →z1→z→u. If we repeatedly merge all faces that intersect with B1 (the gradient bound-ary of R1) by ignoring the crossing edges, eventually we Left)) to traverse F1 clockwise.
3.4 Virtual Surrounding Face of Gradient Boundary
As illustrated in Fig. 2(b), the major purpose of the GBD algorithm is to find chains of sensors to approximate the gradient boundaries. GBD achieves this goal by using the concept of virtual surround face (VSF) as follows.
The sensors in GBD are classified into two types: gradi-ent boundary sensors and internal sensors. A sensor u is called a gradient boundary sensor (GB sensor) if there exists a Gabriel neighbor v of u with grad(u) ≠ grad(v), where grad(u) denotes the gradient region in which u is located. In other words, two neighbor nodes v and u are GB sensors if they are Gabriel neighbors of each other and they are in different gradient regions. Then euv is called a crossing edge. A sensor which is not a GB sensor is called an internal sensor. Fig. 5(a) shows the Gabriel graph of a sensor network and only the sensors in R1 and R2 are shown. In the figure, the dotted lines denote crossing edges and the solid lines denote edges other than crossing edges. In Fig. 5(a), nodes u, v, y and z are GB sensors. Nodes w, x, z5 and z6 are internal sensors.
(a) RC ° 2 18-20 20-22 v z6z522-24 w Fu 1z R1 x y z1R3 zF24z3z2B1 (b) C ° R218-20 20-22 22-24 F1u R1v R3 B1
Fig. 5 Illustration of virtual surrounding face (VSF). Fig 5(a) shows crossing edges (dotted lines) of gradient boundary B1, and Fig 5(b) shows VSF F1 for gradient boundary B1.
VSF of a Gradient Boundary: In a planar graph, for any find a unique face containing B1. This face is called the vir-tual surrounding face (VSF) of B1. The result of the face merg-ing process is shown in Fig. 5(b) in which face F1 (light grey region) is the VSF of B1. F1 is bounded by two disconnected edge sequences: one is the chain with solid dots and the other is the chain with hollow dots. To traverse a VFS, a node determines its next visited node by using the way discussed in Section 3.3, except that the node ignores all its crossing edges. After the sink obtains the sensed values of all sensors located on the VSFs (the two chains), it ap-proximates B1 by a simple geometric computation.
The node on the VSF boundary is called a VSF sensor. Two Gabriel neighbor nodes on the same VSF boundary are called VSF neighbors of each other. As shown in Fig. 5(a), v and w are VSF sensors, and w and x are VSF neighbors. GB sensors are VSF sensors but VSF sensors in the same gradi-ent region are not necessarily GB sensors. For example in Fig. 5(a), w and x are VSF sensors but not GB sensors.
4 DISTRIBUTED GBD ALGORITHM
Based on the idea of VSF, we design a gradient boundary detection (GBD) algorithm for time series snapshot con-struction. GBD consists of three components as follows:
• Time series snapshots and historical queries: Since the
purpose of GBD is to store long term time series snap-shots for the monitored area, the time interval between successive snapshots is expected to impact the accuracy of query responses. In addition, queries may not ask for historical data concerning the exact time of main-tained snapshots, and therefore, there should be a way to answer queries pointing to any given time.
• VSF Traversal: All nodes run an algorithm to select
starting nodes from which VSF traversing is initialized. When a message traverses a VSF, some information of the visited VSF nodes are collected in the messages. • Data Collection: During the VSF traversal, for each VSF
node u, if u detects that a data collection condition holds, then u transmits the GB data to the sink.
4.1 Time Series Snapshots and Historical Queries
GBD divides time into a sequence of predetermined time interval ∆t, and periodically constructs a snapshot for the
monitored area at time k∆t for k > 0. For a historical query related to the value distribution at an arbitrary time t' satis-fying k∆t < t' < (k+1)∆t, the sink can not directly answer the query since it does not store the snapshot at time t'. Instead,
the sink computes the value distribution by linear combina-
6 IEEE TRANSACTIONS ON PARALELL AND DISTRIBUTED SYSTEMS, TPDS-0519-1205
tion of the snapshots at k∆t and (k+1)∆t as follows. Let gridtk[w][w] and gridt(k+1)[w][w] denote the snapshots at time k∆t and (k+1)∆t, respectively. The snapshot gridt’[w][w] at time t' can be computed as follows:
∀1≤i,j≤w:gridt'[i][j]
(3)
t'−k∆t(k+1)∆t−t'
=gridtk[i][j]+gridt(k+1)[i][j]
∆t∆t
The time interval can not be too large or too small. Since the purpose of GBD is to maintain long-term time series snapshots for a monitored area, it is better to choose hourly-based time interval, i.e. one snapshot per one or two hours. If the time interval is too small, the current snapshot is consumed to forward this large message over a long path. If we can choose node u as a starting node, the visit-ing node sequence is u → v1 → v2 → v3 → …, which puts the message closer to the sink after each traversal. Hence, we try to choose nodes with longer distances to the sink as starting nodes for each VSF.
Based on locally stored information, each node u can de-termine if it is a GB node, and only GB nodes participate in the process to select starting nodes. To avoid all the GB nodes being selected and to choose nodes with longer dis-tance to the sink as starting nodes, each GB node starts its face traversal at different time. The longer the distance to the sink, the earlier a GB node starts its face traversal. The may be very similar to the previous snapshot, resulting in redundant data transmissions. If the time interval is too large, it is difficult to compute an accurate estimation of the value distribution at a time between two consecutive snap-shots. The determination of time interval is a design pa-rameter and depends on applications.
4.2 VSF Traversal
Let UDG(V) represent a network with node set V, and GG (V) denote the Gabriel graph of V. In GBD, each node u in V maintains the following local information:
• Node u knows its own geographic location and the
geographic locations of all neighbor nodes in NUDG(u). From NUDG(u), u can compute its Gabriel neighbor set NGG(u) [5] without data transmissions. In addition, u knows all temperature readings of nodes in NUDG(u). • Node u knows the gradient information, GDm = {[T1,
T2), [T2, T3), … , [Tm, Tm+1)}, specified by the sink. Many existing techniques [19], [20] can be used to distribute GDm. GDm distribution is only performed if GDm varies. • Node u knows the sink location and a routing path
from u to the sink. This information can be obtained during GDm flooding, as demonstrated in [3].
• The sink maintains the location-ID pairs of all sensors
in the network which can be collected while initializing the deployed network.
VSF traversing needs to handle two cases: (1) some nodes must be selected to start VSF traversing; (2) each node involved in VSF traversing must be able to determine whether VSF traversing is terminated.
4.2.1 Starting Node Selection of VSF Traversal
For each VSF in GBD, a VSF message is used to traverse the VSF to collect data from the traversed VSF nodes. This process increases the message size after visiting each VSF node. When the message size exceeds the maximum packet size, the message will be forwarded to the sink and a new empty traversal message will continue the face traversal. According to this feature, it is better to select starting nodes of VSF which has the longest Euclidian distance to the sink among its VSF neighbors. As the example in Fig. 6 illus-trated, the sink is located at the top-right corner. If v4 is se-lected as a starting node, one traversal message will visit nodes v4 → v3 → v2 → v1 → u. This message becomes longer after visiting a new node, and the node holding the mes-sage is farther from the sink, resulting in that more energy
starting time is computed as tstart = ∆tp (ds / rs), where ∆tp denotes the time to send one packet with the maximum size, ds is the distance from the node to the sink, and rs is the transmission radius. Reader may note that tstart does not mean the node transmitting its packet at time tstart. Instead, the node starts it transmission process at tstart. For example in the contention-based protocol CSMA/CD, the node ob-tains a time slot for its transmission at tstart + c∆ts, where c∆ts is a random backoff delay and ts is the used time slot. RSink 2 vGB v34 FvRInner VSF 1v21boundary 1w2w
u w31Outer VSF boundary
Fig. 6 Example of VSF traversal.
If a GB node u receives a face traversal message before u starts its own face traversal, u forwards the traversal mes-sage to the next node and cancels its own face traversal. The starting node selection process is presented as Algorithm 1. Algorithm 1 Starting Node Selection for Double VSF Traversal 1: Input: node u; temperature values of 2: u’s neighbors; Gradient GDm; sink location; BEGIN 3: if (u is not a GB node) then exit; 4: u computes its time to start its face traversal and set a timer; 5: if( u receives a face traversal message before the timer expires) then u forwards the message and stop timer; 6: else if( the timer expires first) then u starts its face traversal; 7: END When node u is selected as a starting node, u starts VSF traversing in two directions by using the approach dis-cussed in Section 3.3. During VSF traversing, each traversed node u determines its next traversed node by ignoring all the crossing edges of u.
VSF traversing uses a traversing message in the form of MSG(IDs, trav(…), data), where IDs is the ID of the starting node, trav(…) is the traversing method defined in Section 3.3, and data field is used for data collection to be discussed in Sections 4.2 and 5.1. In the example shown in Fig. 6, if u
LIAN ET AL.: GRADIENT BOUNDARY DETECTION FOR TIME SERIES SNAPSHOT CONSTRUCTION IN SENSOR NETWORKS 7
is selected as a starting node, u sends a message MSG(IDu, trav(u, (v1, Right), (w1, Left)), datau) to nodes v1 and w1. When v1 receives this message, v1 knows itself to be a VSF node. Then, v1 modifies the message to MSG(IDu, trav(v1, (v2, Right)), datav1), and sends the message to v2. Similarly, v2 sends MSG(IDu, trav(v2, (v3, Right)), datav2) to v3, and so on. Thus, all VSF nodes are traversed.
4.2.2 Termination of VSF Traversal
To avoid unnecessary traversing on a VSF, we define a ter-mination condition for a VSF node to decide whether the re-ceived traversing message can be discarded. The termina-tion condition is stated as follows. Since each starting node performs traversing in two directions, intermediate VSF nodes may receive more than one traversing messages. For two VSF neighbors u and v receiving two traversing mes-sages with different traversal rules, if the message received by u will traverse v next and the message received by v will traverse u next, these two messages are discarded by u or v depending on which node transmits its message first. If u transmits its message to v first, then v discards both the messages. If v transmits first, u discards both the messages.
4.3 Data Collection of GBD
In data collection phase, the IDs and the temperature read-ings of sensors related to the VSF traversal are transmitted to the sink for snapshot construction. There are two ways of selecting transmitted data. The first approach is that in VSF traversal each node simply transmits its own reading to the next traversed node. When some nodes collect enough data from previously traversed nodes (such as the data exceed-ing the maximum packet size), these nodes transmit the data with the locations of the data to the sink. This method works well to construct snapshots aiming to minimize boundary error BERR. In the second approach, since the sen-sors with extreme values in the vicinity of traversed sensors have a greater impact on the accuracy of mean absolute error MERR than other sensors, the traversed VSF nodes for-ward the information of its neighbors with the highest and lowest values to the sink. When the sink receives all GB data within a time interval, the sink uses these data to re-cover the snapshot of the monitored region. The snapshot recovery process is discussed in Section 5.
However, in the second approach discussed above, the data received by the sink from VSF nodes are redundant. This is because a Gabriel graph is derived from a UDG graph by removing edges between non-Gabriel neighbors. According to the definition of Gabriel graph, the longer edges in UDG have higher probabilities to be removed. In VSF traversing, messages are solely sent to Gabriel neighbors, indicating that each traversal passes the message to the next VSF node over a short distance. Two successive traversed nodes within short distance may report redun-dant data which results in additional transmissions. Hence, we use a skipping technique to reduce the transmission cost of sending redundant data to the sink as follows.
Skipping traversal is applied when three or more succes-sively traversed VSF nodes are located within the transmis-sion range of the first node. Assume that nodes u, v, and w
are three traversed VSF nodes in sequence. First, u sends a traversing message to v, and also forwards the information of u’s two neighbor nodes with extreme values to the sink. When v receives the traversing message, v sends the trav-ersing message to w, and uses a skipping algorithm to de-cide whether or not forward the information of v’s two neighbor nodes with extreme values to the sink. The skip-ping algorithm is given in Algorithm 2.
In the skipping algorithm, v can determine whether u and w are neighbors of each other by computing their dis-tance. If u and w are not neighbors of each other, v forwards the information of v’s two neighbor nodes with extreme values to the sink. Otherwise, v does not forward the in-formation, and we say that v has been skipped. Algorithm 2 Skipping Algorithm during VSF Traversal 1: Input: VSF node v receiving a traversing message containing the last node u which reports data to the sink. 2: Output: determine whether v reports data to the sink; 3: BEGIN 4: w ← the next traversed node of v 5: if (u and w are one-hop neighbors of each other) 6: return true; 7: else 8: return false; 9: end if END 10: 4.4 VSF Representation in Discrete Grid Structure
Theoretically, we construct a snapshot by using continuous VSFs in a two-dimensional real plane, so the construction will always succeed in this situation. Since it is difficult to represent VSFs in a continuous manner, a discrete grid structure discussed in Section 3.2 is deployed to represent these VSFs. In this case, the selection of the discrete grid structure has a significant impact on the construction of continuous VSFs. If the number of grids in a region is low, the VSF can not be correctly recovered. On the other hand, if the number of grids is very high, the grid structure will have a heavy computation cost and requires a large amount of memory. Hence, there is a trade-off between accuracy of the construction and the maintenance cost.
For practical deployment of GBD, we can determine the grid size with respect to the region size as follows. Assume that a monitored region is a W × W square area. The moni-tored region is partitioned into a w × w grid structure in which each grid is a square with width (height) rg. Let rs denote the transmission radius of a node. Then we can de-termine the grid size rg as follows.
First, as the simulation studies show (Sections 5 and 6), if the transmission radius rs is more than 10 times of the width rg of a grid, the constructed virtual surrounding face has a considerable high accuracy. For the same test configu-rations, further increase of the ratio (rs/rg = 10) does not change the results significantly. For a ratio rs/rg less than 10, the smaller the ratio is, the lower the probability to suc-cessfully construct the virtual surrounding face.
Second, for each practical application, the monitored re-gion (W × W) and the transmission radius rs of a node are fixed. The ratio σ = W/rs approximately indicates the width
8 IEEE TRANSACTIONS ON PARALELL AND DISTRIBUTED SYSTEMS, TPDS-0519-1205
of the deployed network in terms of hop counts. This ratio is unchangeable due to the requirement of the application. Hence, in practical applications, it is better to choose the grid size rg such that rg ≤ W/(10σ) where W and σ are two fixed parameters of an application. For rg > W/(10σ), the constructed VSF may be distorted. Meanwhile, for a known value. Fig. 7(a) shows the source cells (tiny dark or white points) in a network region. In a diffusion step, the value of each non-source cell is upgraded to the mean value of its four nearest neighbor cells. All sources retain their original values. The diffusion step repeats a large number of times, say 1500, to generate an image (Fig. 7(b)).
rg, we can compute w = W/rg.
5 SIMULATION METHODOLOGY
We design a network simulator to evaluate the perform-ance of the GBD algorithm. We define the simulation envi-ronment in this section, and evaluate the performance of GBD by comparing with the Data Cluster (DC) algorithm [11] in the next section. The simulation environment con-sists of two parts: the network model and model of test re-gions (value distribution of a monitored region).
5.1 Network Model
We randomly deploy 2000 homogenous sensors on a 200×200 grid with a transmission radius rs = 10. Each sensor has a unique ID. The sink is placed at one corner of the re-gion. Two nodes communicate with each other by using MAC layer packets. For each packet, we adopt IEEE 802.11a as the MAC layer standard with modification of packet length. IEEE 802.11a specifies a packet with 34 octet header and up to 2312 octet data field. Due to the fact that retrans-mission of long packets over unreliable wireless channels is high, the maximum length of a packet in the paper is lim-ited to 128 octets. Sensor IDs and real valued temperatures are represented by two octets each. Since the only useful part of a packet at the sink is the data field, packet merging is performed by both GBD and DC, that is, the sensors near the sink wait a certain period of time to collect all data from distant sensors, merge all data to a small number of pack-ets, and forward the merged data packets to the sink.
5.2 Test Region of Simulation Study
We adopt two environment models to generate the value distribution of an area in simulation. The first model uses a diffusion process to generate complex test environments to evaluate performance (MERR and BERR). In this model, we do not simulate the changes of the test regions over time, that is, there is no temporal relationship between any two test regions. The second model generates a simple dynamic environment. In this model, a sequence of test regions is generated in temporal order to simulate changes of value distributions over time. This model is used to evaluate per-formance of GBD in terms of BERR in dynamic environment.
5.2.1 Diffusion Process Model We adopt a diffusion process to simulate the value distribu-tion in a test region. In the diffusion process, a test region (network region) is partitioned into a grid structure with w
× w resolution. We randomly select m1 number of cells in
the grid structure as the sources. Each source is assigned a random value within a range converted to a grey level such that the value distribution can be represented by an image. For each non-source cell, its value is initialized to a fixed The image in Fig. 7(b) may be too sharp to reflect the ac-tual value distribution in a real scenario, so a softening step is used, in which m2 number of cells are randomly selected from the image as the sources, and the original sources are discarded. Then the diffusion step repeats to generate the final image (Fig. 7(c)). Fig. 7(d) shows the gradient bounda-ries of Fig. 7(c) by setting Tk – Tk+1 = 50.
In simulation, each test region is a fixed square region partitioned into 200 × 200 unit square cells. A certain num-ber of sources are chosen from the grid structure and each source is assigned to a randomly generated real number in [0, 255] to denote its temperature. In real applications, the range [0, 255] can be projected to any temperature ranges, such as [20.5C°, 30.5C°] on a summer day. The temperature of a non-source cell is initialized to the mean value of all sources. Diffusion and soften steps are performed 1500 times to generate a test region. The number of sources m1 measures the complexity of a test region. In simulations, m1 is taken from a list [50, 100, 150, 200, 250, 300].
(a) An area with 150 sources (b) After diffusion
(c) After soften (d) Shown with gradient Fig. 7 Simulated value distribution of the monitored region.
In GBD, the sink uses the collected data to recover the
snapshot of the monitored region by using a diffusion proc-ess as follows. From the collected data, the sink converts each pair of (IDvi, Tvi) to (XYvi, Tvi), where XYvi is x and y coordinates of a sensor vi. If a cell contains a sensor vi, the cell is a source with temperature Tvi. For each non-source cell, its temperature is initialized to the mean value of all
sources. Next, the diffusion step is repeated until the snap-shot reaches a stable state, i.e. the difference between two successive snapshots is lower than a threshold value. 5.2.2 Simple Dynamic Environment Model
In this model, we consider the temperature distribution in a 200 × 200 test region located at a cone shaped hill with the apex at the centre of the region. In general, when a sub-region in the cone has a smaller height, its temperature is
LIAN ET AL.: GRADIENT BOUNDARY DETECTION FOR TIME SERIES SNAPSHOT CONSTRUCTION IN SENSOR NETWORKS 9
higher too. In this model, the temperature of a point in the work. Simply using the distance from a node to the border test region is inversely proportional to its height. This fea-to determine border nodes results in either too many or too ture is illustrated in Fig. 8, where the darker area indicates few nodes being selected as border nodes.
higher height and lower temperature in the area.
17C° boundary 15C° boundary
Fig. 8 The changes of temperature distribution of the monitored region
over time. Left: the temperature distribution at 8:00 am. Middle: the distribution at 10:00 am. Right: the distribution at 12:00 am.
GBD is designed for long-term snapshot construction and collect data on an hourly basis. To model the tempera-ture changes over time, we consider a typical day from 8:00 am to 1:00 pm, and construct one snapshot per hour. For simplicity and without lose of generality, we assume that each point in the test region increases by 1C° per hour. Hence, from 8:00 am to 1:00 pm, each point increases by 6C°. Fig. 8 shows three temperature distributions at 8:00 am, 10:00 am, and 12:00 am. From the figure, we can ob-serve that the gradient regions change over time. Even though this model is simple, it captures most of the charac-teristics when the temperature distribution changes over time in a typical day. We use this simple dynamic model because it requires less computation power for the simula-tion compared to the diffusion model.
6 EXPERIMENTAL RESULTS AND DISCUSSION
Based on the simulation environment discussed in Section5, we show properties of the GBD algorithm, and compare its performance with the DC algorithm [11].
6.1 Experiment Results in Diffusion Process Model 6.1.1 Properties of GBD
In GBD, VSF nodes send their related data to the sink for snapshot construction. Without using GBD, a simple solu-tion is that all normal boundary (NB) nodes report their data. Here, an NB node u is defined as a node with at least one neighbor that is not in the same gradient region as u. How-ever, two reasons prohibit the use of this method.
First, since gradient boundary may not enclose the outer boundary of the network graph, it is possible that a part of network region is not monitored. As shown in Fig. 9, even though NB nodes can detect the gradient boundaries (dot-ted curves), no NB nodes are located at the part of border of the monitored regions, resulting in no sensor in the sub-regions close to the border reporting data. If these uncov-ered sub-regions are large, the recovered snapshots have low accuracy. In addition, it is difficult to determine which node is a border node in a randomly deployed sensor net- Fig. 9 Gradient boundaries (dotted curves) of the monitored region with Tk+1 – Tk = 50 for all possible integer k.
Second, since sensors are densely deployed, the number of NB nodes is much larger than VSF nodes, and the data collected by all the NB sensors in the vicinity are redun-dant. Fig. 10 shows NB nodes and VSF nodes in a simple test region. Hence, VSF traversing is used to reduce the total number of nodes reporting data to the sink.
Fig. 10 The boundary nodes for a gradient boundary (dark circle). Left: small dark dots are NB nodes. Right: small dark dots are GB nodes. The number of NB nodes is much larger than the number of GB nodes.
By using test regions generated by the diffusion process, Fig. 11 plots different types of nodes by using simulation. The labels of curves denote the gradient interval GL = (Tk+1 – Tk). For example in Fig. 11(a), for a test region with 100 sources and GL = 20, the total number of NB nodes is 1621. From Fig. 11, we make the following observations.
In Fig. 11(a), it can be observed that the total number of NB nodes increases with the increase in the number of sources in the monitored regions. This is because a larger number of sources results in more nodes located on the gradient boundaries. It is obvious that the number of gradi-ent regions is smaller if the gradient interval is larger. Hence, the total number of NB nodes decreases with in-crease in the gradient interval GL. Fig. 11(b) depicts that the total number of GB nodes is an increasing function of the number of sources. Intuitively, the more complex a moni-tored region is, the longer the gradient boundary, and the larger the number of GB nodes. Fig. 11(c) shows that the total number of traversed VSF nodes which are not GB nodes. This number also indicates the number of nodes located on the border of the monitored region but not lo-
10 IEEE TRANSACTIONS ON PARALELL AND DISTRIBUTED SYSTEMS, TPDS-0519-1205
cated on any gradient boundaries. When the number of sources increases, the portion of the border of monitored region not covered by gradient boundaries decreases. Hence, the number of the VSF but not GB nodes is a de-creasing function of the number of sources, as shown in Fig. 11(c). The reported data from these nodes help to draw the boundary of a recovered snapshot with higher accuracy. We also observed from Fig. 11(a) and Fig. 11(b) that VSF traversal reduces 40%–50% of the total number of NB nodes reporting data to the sink. Hence, VSF traversal not only reduces the transmissions of redundant data, but detects Comparison Results for Mean Absolute Error MERR: Fig. 12 shows MERR of the recovered snapshots and the total costs for test regions with varied number of sources by us-ing GBD. In Fig. 12, the labels of curves denote the gradient interval GL = (Tk+1 – Tk). For example in Fig. 12(a), for a test region with 100 sources and GL = 20, the recovered snap-shot has MERR = 1.96. From Fig. 12(b), we can find that the total cost C related to MERR = 1.96 is 248000 octets.
To achieve the same MERR for the same test field, we list the transmission cost C of DC in Fig. 13. To give a clear view of comparison, we also list the costs of GBD in the the value distribution on the border of a monitored region.
6.1.2 Comparison of Transmission Cost of GBD and DC
We use the total number of transmitted octets, denoted by C, to measure the efficiency of the algorithms. Then we compute MERR and BERR for the original value distribution and the recovered snapshot. Since GBD and DC have dif-ferent transmission costs, we do not directly compute MERR and BERR of the two approaches. Rather, we run GBD with varied parameters and generate MERR and BERR with the associated cost C. Then, we find C of DC which generates the same MERR and BERR of GBD. Finally, we compare the costs of GBD and DC. Each data point in all figures in this section is the average of 10 sample runs.
20 s18 ed16o n )1420 sBd12
Ne1030 rfod840 nr650eubH4 m(2u
N0050100150200250300350
Number of sources in the test area Sources number of test area
(a) Total number of NB nodes
14 s
ed12o n )10 GL=20Fs SdVe8GL=30r fdon6GL=40
rue4GL=50
bH(m2 uN0 050100150200250300350
SNumber of sources in the test area ources number of test area
(b) Total number of GB nodes
-n250 oN & 200 sFeGL=20 Sd150VoGL=30 fn
o B100GL=40reGGL=50
bm50
uN0
050100150200250300350
Number of sources in the test area Sources number of test area
(c) Total number of VSF but not GB nodes
Fig. 11 Numbers of different types of nodes for networks with 2000
nodes in the GBD algorithm.
same figure. For example in Fig. 13(a), for the curve labeled with DC-GL=20 and a test region with 100 sources, we can find C = 365000, which gives the total transmission cost of DC to achieve MERR = 1.96 found in Fig. 12(a). To achieve the same MERR = 1.96, GBD transmits 248000 octets in total (the curve labeled with GBD-GL-20 in Fig. 13(a)).
According to Fig. 13, we observed that GBD saves up to 50% of the total cost of DC to construct snapshots with the same mean absolute error for the same test regions. This performance gain can be analyzed as follows.
5 r orr4 e eGL=20tu3GL=30 losb2GL=40 a GL=50na1 eM 0050100150200250300350 Sources number of test a Number of sources in the test area rea (a) MERR of GBD for networks with 2000 nodes ts300oc n 270o)isGL=20 sdsin240maGL=30s snuo210GL=40a rhGL=50tt (la180 toT150 050100150200250300350 Number of sources in the test area Sources number of test area(b) Total transmission cost C of GBD to achieve MERR specified in (a) Fig. 12 Mean absolute error and related cost of the GBD algorithm. Both GBD and DC have an initialization phase and a data collection phase for each snapshot construction. In the former phase, GBD and DC build their structures (VSF for GBD and representative nodes for DC). In the latter phase, GBD and DC use the built structures to send data to the sink. In addition, VSF traversal in GBD occurs in this phase. In the initialization phase of GBD, all sensors broadcast their sensed data once for the first snapshot construction. For the successive snapshot constructions, only sensors that change their gradient regions broadcast their sensed data once in the initialization phase. Similarly, in the initializa-tion phase of DC, all sensors send a maximum of six mes-sages to select representative nodes to construct the first
LIAN ET AL.: GRADIENT BOUNDARY DETECTION FOR TIME SERIES SNAPSHOT CONSTRUCTION IN SENSOR NETWORKS 11
snapshot. In later snapshot constructions, sensors changing their gradient regions broadcast their values, and all sen-sors affected by these changes participate in the new repre-sentative node selection process in the initialization phase. Let C1 denote the cost of the initialization phase and let C2 denote the cost of the data collection phase. We have C = C1 + C2. We observed (details not shown in the paper) that the data collection cost C2 of GBD is approximately 10% more than C2 of DC. Combining this observation with the total costs shown in Fig. 13, we conclude that the initializa-tion cost C1 of GBD is much smaller than C1 of DC.
Obviously, in the initialization phase, GBD has different ated cost C have been given in Fig. 12(b). In Fig. 14, the la-bels of curves denote the gradient interval GL = (Tk+1 – Tk). For example in Fig. 14, for a test region with 100 sources and GL = 20, the recovered snapshot has BERR = 0.1. Then, from Fig. 12(b), we can find that the total cost C related to BERR = 0.1 is 248000 octets.
0.2 0.15GL=20 GL=300.1GL=40
GL=500.05
dary error (BERR) costs in the first and the later snapshot constructions. The similar result holds for DC too. The results in Fig. 13 only show the performance gain of GBD for the first snapshot construction. One may argue that the performance gain of GBD may not be achieved in the later snapshot construc-tions. However, the performance gain almost holds in suc-cessive snapshot constructions and the reason is as follows. Since GBD is designed for long term environmental moni-toring, the time interval between two successive snapshots is hourly-based. In reality, temperature values change sig-nificantly after a long, say one or two hours, time interval. As a consequence, a large number of sensors change their gradient regions. These changed sensors cause almost all sensors in DC to participate in the selection process of rep-resentative nodes which lead to a similar initialization cost as the one in the first snapshot construction. This discussion is verified in the experimental studies shown in Section 6.2 under the dynamic environment.
t400s oc n360 o )isGBD-GL=20s sd320inGBD-GL=30mas
suno280DC-GL=20ahDC-GL=30rt(
t l240at
oT200 050100150200250300350
Number of sources in the test area
(a) Total transmission cost C for GL = 20 and 30
400ts oc360 n
o)320GBD-GL=40issdsGBD-GL=50
inma280sDC-GL=40s nuoa240DC-GL=50rhtt (l200at oT160050100150200250300350
Number of sources in the test area (b) Total transmission cost C for GL = 40 and 50
Fig. 13 Total transmission costs of GBD and DC to achieve the MERR level specified in Fig. 12(a).
Comparison Results for Boundary Error BERR: In this set of experiments, we compare GBD and DC by considering the cost C to achieve a certain level of boundary error BERR. Fig. 14 plots BERR of recovered snapshots of GBD and its associ- nuo B0050100150200250300350 Sources number of test area
Number of sources in the test area Fig. 14 Boundary error of GBD with the cost C given in Fig. 12(b).
Fig. 15 plots the cost C of DC and GBD for achieving the same BERR levels specified in Fig. 14. From Fig. 15, we ob-served that GBD saves up to 50% of the total cost of DC to construct snapshots for the same test regions.
As discussed above, both GBD and DC have the initiali-zation cost C1 and the data collection cost C2. To achieve the same BERR levels, we observed (from the experimental re-sults not shown here) that the data collection cost C2 of GBD is approximately 5% - 10% higher than the cost of DC for small gradient interval (GL = 20), and GBD saves 10% - 40% of the data collection cost of DC for large gradient in-terval (GL = 30, 40 and 50). Therefore, in boundary detec-tion applications, both the initialization process and the data collection process of GBD contribute to the perform-ance gain of GBD compared with DC.
400ts oc 360n
o)GBD-GL=20issds320GBD-GL=30
inmass280DC-GL=20 nuoaDC-GL=30rhtt (240 lato T200050100150200250300350 Number of sources in the test area
(a) Total transmission cost C for GL = 20 and 30
ts400o c n360 ) oissd320GBD-GL=40sinmaGBD-GL=50
su280sDC-GL=40o
narh240DC-GL=50tt (
la200to T160 050100150200250300350
Number of sources in the test area
(b) Total transmission cost C for GL = 40 and 50
Fig. 15 Total transmission costs of GBD and DC to achieve the BERR level specified in Fig. 14.
We also find that in the initialization phase, the cluster
12 IEEE TRANSACTIONS ON PARALELL AND DISTRIBUTED SYSTEMS, TPDS-0519-1205
construction (representative node selection) cost for DC can be higher than the data collection cost in the worst case. This worst case occurs when the value distribution of most sub-regions changes dramatically such that the environ-mental change exceeds the pre-specified gradient interval. In this case, most of the representative nodes are voted again. However, many applications supporting historical queries operate on this worst case scenario. This is because the sink collects/samples snapshot data on an hourly basis, resulting in that the temperature distribution changes sig-nificantly between two successive snapshots. On the other hand, GBD has no such initialization cost except nodes terms of MERR, we obtain a very close estimation of the gra-dient boundary in terms of the boundary error BERR.
By using the preceding method and the dynamic envi-ronment model with six temporally ordered snapshots (Section 5.2.2), we compare GBD and DC and show the re-sults in Table 1. In the experiment, the gradient interval is configured to be 25 (grey level in the image). This interval can be easily mapped to any real temperature setting.
According to the results shown in Table 1, we have the following observations. First, in most of cases, GBD con-structs snapshots with much lower BERR than DC. Second, the initialization costs of DC for the snapshot at 8:00 am is changing their gradient regions reporting their values once before each construction. This reporting cost in the initiali-zation phase is local and small compared to the data collec-tion cost of GBD. The experimental results in Section 6.2 support this observation in a dynamic network model. Hence, the overall performance of GBD is superior to DC. According to the simulation results, GBD can save up to 50% of the total cost of DC to construct a snapshot with the same mean absolute error and boundary error under the environment models generated by the diffusion process.
6.2 Experimental Results in Simple Dynamic Model
In Section 6.1, we applied GBD to reduce MERR and BERR. In addition, GBD is applied to fulfill the features of test re-gions generated by the diffusion process. More specifically, each VSF node reports the highest and lowest values of its neighbor nodes to the sink. The sink uses these values to recover snapshots based on the diffusion process. If we only focus on minimizing BERR, which is the primary goal of GBD, there is a more efficient solution as follows.
Fig. 16 Applying GBD for BERR only. Left: detected gradient boundary (circle). Middle: gradient boundary bounded by two VSFs (inner curve and outer curve of the circle). Right: detected gradient boundary.
Fig. 16 (left) shows a circular gradient boundary. By us-ing GBD, two VSFs are constructed inside and outside the
circular gradient boundary (Fig. 16 middle). The VSF nodes report their values to the sink. Based on the received data, the sink draws the VSF in the recovered grid. More specifi-cally, for any two VSF neighbors, the sink draws a line be-tween the two neighbors and assigns values to all points on the line based on the temperature of the two end nodes of the line. After drawing the VSF, for any point p in the test region, but not located on the VSF, p sets its temperature to the value of a point located at the boundary of VSF with the shortest distance to p. Fig. 16 (right) shows the recovered gradient boundary by using this method. Even though the recovered temperature distribution inside the gradient re-gion significantly differs from the original distribution in
slightly higher than the initialization cost of the later snap-shots. This is because, for any two successive snapshots, approximately one third of sensors change their gradient regions in the latter snapshot and these sensors lead more nodes in their vicinities to update their representing lists. Therefore, majority of sensors participate in the selection process of representative nodes, resulting in similar initiali-zation costs in all snapshot construction. This observation corroborated our discussion in Section 6.1.2, and shows that the simulation results obtained in Section 6.1.2 not only hold for the first snapshot construction, but also approxi-mately hold for all later snapshot constructions. Third, the data collection cost of GBD is slightly higher than the col-lection cost of DC, which is similar to the observation ob-tained in Section 6.1.2. Finally, the total cost of DC is almost twice the total cost of GBD. Hence, the experimental results show that GBD outperforms DC in the dynamic environ-ment for large gradient intervals.
Table 1 Costs and BERR for GBD and DC in networks with 2000 nodes Construction Algorithm
Ini cost Total cost BERR
Time of snapshot (in octets)
(in octets)
8:00 am GBD 75316 147560 1.33% DC 262400 321340 0.77% 9:00 am
GBD 26448 96596 1.66% DC 213532 279472 4.88% 10:00 am GBD 36252 88340 2.03% DC 223336 282276 5.44% 11:00 am GBD 25536 92508 1.53% DC 212620 281560 5.32% 12:00 noon GBD 31046 116874 2.97% DC 218130 286593 9.32% 1:00 pm GBD 31350 103594 1.34% DC 218434 282374 6.17% 6.3 Load Balancing
One question not addressed for GBD is the load balancing
problem. In DC, representative nodes consume much more energy than normal nodes. DC achieves load balancing by preventing nodes with low energy levels from becoming representative nodes. More specifically, a node in DC with low energy level will not respond to the received construc-tion message of the representative node.
In GBD, comparing with other non-VSF nodes, VSF
LIAN ET AL.: GRADIENT BOUNDARY DETECTION FOR TIME SERIES SNAPSHOT CONSTRUCTION IN SENSOR NETWORKS 13
nodes of all gradient boundaries are responsible for travers-ing VSFs and collecting data for the sink. Hence, VSF nodes have much higher energy consumption than other nodes. However, the load balancing problem can be easily solved in GBD under two scenarios as follows.
First, in many situations, GBD does not have load bal-ancing problem at all, since in real applications the changes of temperature distributions in a monitored area automati-cally balance the load. Considering the temporal snapshots in Fig. 8, we observe that the gradient boundaries change significantly over time. The boundary change causes that different sets of sensors form the VSFs of the same gradient support historical queries for sensor networks. In GBD, a monitored region is partitioned into a set of gradient re-gions based on the value distribution, and all sensor read-ings in each region are in the same predefined gradient range. GBD employs the concept of virtual surrounding face (VSF) to approximate the boundaries of gradient re-gions and constructs snapshots of the monitored region by traversing all the VSFs. GBD is designed to minimize boundary error (BERR) and mean absolute error (MERR), and reduce the total transmission cost in constructing snapshots with a given accuracy level.
We compare the performance of GBD with the Data range at different time, which results in load balancing for all sensors. From the same simulation associated with Table 1 under the dynamic environment model, we list the statis-tics of total number of messages sent by all sensors during the VSF traversal in Table 2. From Table 2, we can find that 829 VSF nodes send one VSF message, 432 VSF nodes sends two messages, and 131 nodes send three messages. Only 11 nodes send more than 6 messages and 80 nodes (less than 5% of the total nodes) send more than 4 messages. For long term operations, load balancing is not a problem in this type of scenarios such as outdoor temperature monitoring. In the second scenario, we assume that the changes of value distributions for a monitored area can not automati-cally balance the loads for all sensors in GBD. In this case, GBD needs to balance the load by modifying itself. GBD can easily achieve load balancing by adjusting the initial value of gradient configuration at different time. For exam-ple, assume that initially we use 4 levels of gradient GD4 = {(10, 20), (20, 30), (30, 40), (40, 50)} with gradient interval GL = 10. After n number of snapshot constructions, the sink can issue a new query with GD5 = {(5, 15), (15, 25), (25, 35), (35, 45), (45, 55)} to all sensors. Since GL is not changed in the new query (GL = 10), the recovered snapshots will re-tain its accuracy (same MERR and BERR). However, since the new configuration changes the gradient boundaries, the overloaded VSF nodes in the old configuration become non-VSF nodes, resulting in balanced load for all nodes in long term operations. If we choose a relatively large n, the cost incurred in new query dissemination can be ignored. On the other hand, the sink can also add a tunable parame-ter of GD in the old query which triggers all sensors auto-matically change their GD after n snapshot constructions. In this way, the new query dissemination cost is removed.
Table 2 Statistics for the number of VSF messages sent by sensors dur-ing the six snapshot constructions. TX denotes transmission. TX.# 1 2 3 4 5 6 7 8 9 Node.# 829 432 194 103 44 25 4 0 1 TX.# 10 11 12 13 14 15 16 17 18 19 Node.# 5 1 0 0 0 0 0 0 0 0 7 CONCLUDING REMARKS In this paper, we propose a Gradient Boundary Detection
(GBD) algorithm for time series snapshot construction to
Cluster (DC) algorithm. According to the experimental re-sults, for applications focusing on minimizing the mean absolute error, GBD has a similar transmission cost of DC in the data collection phase. For applications sensitive to the boundary error, in the data collection phase, GBD has a similar transmission cost of DC for small gradient interval, and saves 10% - 40% transmission cost of DC for large gra-dient intervals. However, for both applications, the initiali-zation cost of GBD is significantly less than the cluster con-struction and maintenance cost related to DC. Therefore, GBD can reduce up to 50% of the overall cost of DC to con-struct a snapshot with a given accuracy level for both the mean absolute error and the boundary error.
ACKNOWLEDGMENT
The authors thank the anonymous reviewers for their help-ful comments. This work was supported in part by the NSERC (Canada), the NSFC Key Project Grant No.60533110, and National Grand Fundamental Research 973 Program of China under Grant No.2006CB303000.
REFERENCES
[1] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless
Sensor Networks: A Survey,” Computer Networks, vol. 38(4), 2002. [2] C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directed Diffusion: A
Scalable and Robust Communication Paradigm for Sensor Networks,” in Proc. ACM Mobicom, 2000.
[3] S. Madden, M. Franklin, J. Hellerstein, and W. Hong, “TAG: A Tiny
Aggregation Service for Ad-hoc Sensor Networks,” in Proc. Symposium on Operating Systems Design and Implementation (OSDI), 2002.
[4] J. Lundquist, D. Cayan, and M. Dettinger, “Meteorology and Hydrol-ogy in Yosemite National Park: A Sensor Network Application,” In-formation Processing in Sensor Networks, April 2003.
[5] E. Kranakis, H. Singh, and J. Urrutia, “Compass Routing on Geometric
Networks,” in Proc. Canadian Conference on Computational Geometry,
1999, pp. 51–54.
[6] P. Bose, P. Morin, I. Stojmenovic, and J. Urrutia, “Routing with Guaran-teed Delivery in Ad Hoc Wireless Networks,” Wireless Networks, vol. 7(6), 2001, pp. 609–616. [7] A. Mairwaring, J. Polastre, R. Szewczyk, D. Culler, and J. Anderson,
“Wireless Sensor Networks for Habitat Monitoring,” in Proc. ACM
Workshop on Wireless Sensor Networks and Applications, 2002.
[8] P. Juang, H. Oki, Y. Wang, et al. “Energy-Efficient Computing for Wild-life Tracking: Design Tradeoffs and Early Experiences with ZebraNet,” in Proc. Conference on Architectural Support for Programming Languages and Operating Systems, October 2002.
[9] P. Zhuang, C. Sadler, S. Lyon, and M. Martonosi, “Hardware Design
Experiences in ZebraNet,” in Proc. ACM Conference on Embedded Net-worked Sensor Systems, November 2004.
14 IEEE TRANSACTIONS ON PARALELL AND DISTRIBUTED SYSTEMS, TPDS-0519-1205
[10] I. Stojmenovic, M. Seddigh, and J. Zunic, “Dominating Set and
Neighbor Elimination Based Broadcasting Algorithms for Wireless Networks,” IEEE Trans. on Parallel and Distributed Systems, vol. 13(1), 2002, pp. 14–25.
[11] Y. Kotidis, “Snapshot Queries: Towards Data Centric Sensor Net-works,” in Proc. International Conference on Data Engineering, 2005.
[12] T. He, S. Krishnamurthy, L. Luo, et al. “VigilNet: An Integrated Sensor
Network System for Energy-Efficient Surveillance,” ACM Transactions on Sensor Networks, 2005.
[13] Y. Yao and J. Gehrke, “Query Processing for Sensor Net,” in Proc. Con-ference on Innovative Data Systems Research (CIDR), 2003.
[14] M. A. Sharaf, J. Beaver, A. Labrinidis, and P. K. Chrysanthis, “TinA: A
Scheme for Temporal Coherency-aware In-network Aggregation,” in Lei Chen received his BS degree in Computer Science and Engineering from Tianjin University, China, in 1994, the MA degree from Asian Institute of Technology, Thailand, in 1997, and the PhD degree in computer science from University of Waterloo, Canada, in 2005. He is now an assistant professor in the Department of Computer Science and Engineering at Hong Kong University of Sci-ence and Technology. His research interests in-clude multimedia database, sensor databases,
peer-to-peer databases and probabilistic database. He is a member of the IEEE and the IEEE Computer Society.
Kshirasagar Naik Kshirasagar Naik received his Proc. ACM Workshop on Data Engineering for Wireless and Mobile Access (MobiDE), 2003.
[15] J. Beaver, M. A. Sharaf, A. Labrinidis, and P. K.Chrysanthis, “Location-Aware Routing for Data Aggregation in Sensor Networks,” in Proc. Geo Sensor Networks Workshop, 2003.
[16] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “Energy-Efficient Communication Protocols for Wireless Microsensor Net-works,” in Proc. International Conference on Systems Science, 2000.
[17] Q. Fang, F. Zhao, and L. GuiBas, “Lightweight Sensing and Communi-cation Protocols for Target Enumeration and Aggregation,” in Proc. ACM MobiHoc, 2003.
[18] A. Arora, P. Dutta, S. Bapat, el al., “A Line in the Sand: A Wireless Sen-sor Network for Target Detection, Classification, and Tracking,” Com-puter Networks (Elsevier), vol. 46, no. 5, Dec, 2004, pp. 605-634.
[19] A. Qayyum, L. Viennot, and A. Laouiti, “Multipoint Relaying for Flood-ing Broadcast Messages in Mobile Wireless Networks,” in Proc. Hawaii Intl. Conference on System Sciences (HICSS), 2002, pp. 3898–3907.
[20] I. Stojmenovic, S. Seddigh, and J. Zunic, “Dominating Sets and
Neighbor Elimination-based Broadcasting Algorithms in Wireless Networks,” IEEE Trans. On Parallel and Distributed Systems, vol. 13(1), January 2002, pp. 14–25.
[21] P. Enge and P. Misra, “Special Issue on GPS: The Global Positioning
System,” in Proc. IEEE, Jan. 1999, pp. 3–172.
[22] B. Hofmann-Wellenhof, H. Lichtenegger, and J. Collins, “Global Posi-tioning System: Theory and Practice,” Springer-Verlag, 1997.
[23] A. S. Krishnakumar and P. Krishnan, \"On the Accuracy of Signal
Strength-based Location Estimation Techniques,\" in Proc. IEEE INFO-COM, 2005, pp. 642–650.
[24] S. Ray, W. Lai, and I. C. Paschalidis, “Statistical Location Detection with
Sensor Networks,” IEEE Trans. on Information Theory, vol. 52(6), 2006. [25] N. Patwari, A. O. Hero, III, M. Perkins, N. Correal, R. O'Dea, \"Relative
Location Estimation in Wireless Sensor Networks,\" IEEE Trans. on Sig-nal Processing, Special Issue on Signal Processing in Networks, 2002. [26] S. Capkun, M. Hamdi, and J. P. Hubaux, \"GPS-Free Positioning in Mo-bile Ad-Hoc Networks,\" in Proc. HICSS, Hawaii, January 2001.
[27] I. Stojmenovic and L. Xu, “Power Aware Localized Routing in Wireless
Networks,” IEEE Trans. on Parallel and Distributed Systems, vol. 12(11), November 2001, pp. 1122–1133.
Jie Lian received the BS degree in computer sci-ence from Inner Mongolia University, China, in 1992, the MA degree from Asian Institute of Tech-nology, Thailand, in 1998, and the PhD degree in electrical and computer engineering from University of Waterloo, Canada, in 2006. He is now a postdoc-toral fellow in Department of Electrical and Com-puter Engineering at University of Waterloo. His research interests are in the areas of ad hoc and sensor networks, peer-to-peer system, energy-efficient techniques for portable wireless devices, and RFID. He is a member of the IEEE and the IEEE Computer Society.
BS and M. Tech degrees from Sambalpur Unver-sity, India, and the Indian Institute of Technology, Kharagpur, India. He received an M. Math degree in computer science from the University of Water-loo and a Ph.D. degree in electrical and computer engineering from Concordia University, Montreal. He worked as a faculty member at the University of Aizu in Japan, and Carleton University in Ot-tawa. At present he is an associate professor in the Department of Electrical and Computer Engineering, at the University of Waterloo. He was a Program Co-Chair of the 5th International Conference on Infor-mation Technology, Bhubaneswar, India, 2002. He is on the Program Committee of several international conferences in the area of wireless communication and mobile computing. He was a co-guest editor of the special issue of IEEE JSAC on Mobile Computing and Networking. He was the lead guest editor of the special issue of IEEE JSAC on Peer-to-Peer Communications and Applications. His research interests are testing of communication protocols, wireless communication, resource allocation in cellular and 3G systems, sensor networks, ad hoc net-works, MAC protocols for wireless LAN, Bluetooth networks, mobile computing, and peer-to-peer communication. He is a member of the IEEE and the IEEE Computer Society.
Yunhao Liu received his BS degree in Automation Department from Tsinghua University, China, in 1995, and an MA degree in Beijing Foreign Stud-ies University, China, in 1997, and an MS and a Ph.D. degree in Computer Science and Engineer-ing at Michigan State University in 2003 and 2004, respectively. He is now an assistant professor in the Department of Computer Science and Engi-neering at Hong Kong University of Science and Technology. His research interests include peer-to-peer and grid computing, sensor networks, pervasive computing, network security, and high-speed networking. He is a senior member of the IEEE Computer Society.
Gordon B. Agnew received his BASc. and Ph.D. in Electrical Engineering from the University of Waterloo in 1978 and 1982, respectively. He joined Department of Electrical and Computer Engineering at the University of Waterloo in 1982. His areas of expertise include cryptography, data security, protocols and protocol analysis, elec-tronic commerce systems, high-speed networks, wireless systems and computer architecture. He is a member of the International Association for
Cryptologic Research, a Foundation Fellow of the Institute for Combi-natorics and its Applications and a Registered Professional Engineer-ing in the Province of Ontario. He is also a co-founder of CERTICOM Corp., a world leader in public key cryptosystem technologies. He is a member of the IEEE and the IEEE Computer Society.
因篇幅问题不能全部显示,请点此查看更多更全内容