Skip to content
Snippets Groups Projects
Commit 97c7cc64 authored by Recolic Keghart's avatar Recolic Keghart
Browse files

working translate

parents
No related branches found
No related tags found
No related merge requests found
main.docx 0 → 100644
File added
This diff is collapsed.
This diff is collapsed.
File added
`↵` TO `ff`
`✏` TO `ffl`
`\n` TO ` `
`- ` TO ``
`\\` TO `\n`
Dynamic graph analytics has a wide range of applications. Twitter can recommend information based on the up-todate TunkRank (similar to PageRank) computed based on a dynamic attention graph [14] and cellular network operators can fix traffic hotspots in their networks as they are detected [28]. To achieve real-time performance, there is a growing interest to o✏oad graph analytics to GPUs due to its much stronger arithmetical power and higher memory bandwidth compared with CPUs [45]. Although existing solutions, e.g. Medusa [58] and Gunrock [50], have explored GPU graph processing, we are aware of only one work [30] that considers a dynamic graph scenario which is a major gap for running analytics on GPUs. In fact, a delay in updating a dynamic graph may lead to undesirable consequences. For instance, consider an online travel insurance system that detects potential frauds by running ring analysis on profile graphs built from active insurance contracts [5]. Analytics on an outdated profile graph may fail to detect frauds which can cost millions of dollars. However, updating the graph will be too slow for issuing contracts and processing claims in real time, which will severely influence legitimate customers’ user experience. This motivates us to develop an update-efficient graph structure on GPUs to support dynamic graph analytics.
There are two major concerns when designing a GPUbased dynamic graph storage scheme. First, the proposed storage scheme should handle both insertion and deletion operations efficiently. Though processing updates against insertion-only graph stream could be handled by reserving extra spaces to accommodate updates, this naı̈ve approach fails to preserve the locality of the graph entries and cannot support deletions efficiently. Considering a common sliding window model on a graph edge stream, each element in the stream is an edge in a graph and analytic tasks are performed on the graph induced by all edges in the up-to-date window [51, 15, 17]. A naı̈ve approach needs to access the entire graph in the sliding window to process deletions. This is obviously undesirable against high-speed streams. Second, the proposed storage scheme should be general enough for supporting existing graph formats on GPUs so that we can easily reuse existing static GPU graph processing solutions for graph analytics. Most large graphs are inherently sparse. To maximize the efficiency, existing works [6, 33, 32, 30, 53, 24] on GPU sparse graph processing rely on optimized data formats and arrange the graph entries in certain sorted order, e.g. CSR [33, 6] sorts the entries by their row-column ids. However, to the best of our knowledge, no schemes on GPUs can support efficient updates and maintain a sorted graph format at the same time, other than a rebuild. This motivates us to design an update-efficient sparse graph storage scheme on GPUs while keeping the locality of the graph entries for processing analytics instantly.
In this paper, we introduce a GPU-based dynamic graph analytic framework followed by proposing the dynamic graph storage scheme on GPUs. Our preliminary study shows that a cache-oblivious data structure, i.e., Packed Memory Array (PMA [10, 11]), can potentially be employed for maintaining dynamic graphs on GPUs. PMA, originally designed for CPUs [10, 11], maintains sorted elements in a partially contiguous fashion by leaving gaps to accommodate fast updates with a constant bounded gap ratio. The simultaneously sorted and contiguous characteristic of PMA nicely fits the scenario of GPU streaming graph maintenance. However, the performance of PMA degrades when updates occur in locations which are close to each other, due to the unbalanced utilization of reserved spaces. Furthermore, as streaming updates often come in batches rather than one single update at a time, PMA does not support parallel insertions and it is non-trivial to apply PMA to GPUs due to its intricate update patterns which may cause serious thread divergence and uncoalesced memory access issues on GPUs.
We thus propose two GPU-oriented algorithms, i.e. GPMA and GPMA+, to support efficient parallel batch updates. GPMA explores a lock-based approach which becomes increasingly popular due to the recent GPU architectural evolution for supporting atomic operations [18, 29]. While GPMA works efficiently for the case where few concurrent updates conflict, e.g., small-size update batches with random updating edges in each batch, there are scenarios where massive conflicts occur and hence, we propose a lock-free approach, i.e. GPMA+. Intuitively, GPMA+ is a bottom-up approach by prioritizing updates that occur in similar positions. The update optimizations of our proposed GPMA+ are able to maximize coalesced memory access and achieve linear performance scaling w.r.t the number of computation units on GPUs, regardless of the update patterns.
The contributions of this paper are summarized as follows:
• We introduce a framework for GPU dynamic graph analytics and propose, the first of its kind, a GPU dynamic graph storage scheme to pave the way for real-time dynamic graph analytics on GPUs.
• We devise two GPU-oriented parallel algorithms: GPMA and GPMA+, to support efficient updates against highspeed graph streams.
• We conduct extensive experiments to show the performance superiority of GPMA and GPMA+. In particular, we design different update patterns on real and synthetic graph streams to validate the update efficiency of our proposed algorithms against their CPU counterparts as well as the GPU rebuild baseline. In addition, we implement three real world graph analytic applications on the graph streams to demonstrate the efficiency and broad applicability of our proposed solutions. In order to support larger graphs, we extend our proposed formats to multiple GPUs and demonstrate the scalability of our approach with multi-GPU systems.
The remainder of this paper is organized as follows. The related work is discussed in Section 2. Section 3 presents a general workflow of dynamic graph processing on GPUs. Subsequently, we describe GPMA and GPMA+ in Sections 45 respectively. Section 6 reports results of a comprehensive experimental evaluation. We conclude the paper and discuss some future works in Section 7.
//////////////////
In this section, we review related works in three different categories as follows.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment