Table of Contents
Tutorial Notes on Web-Scale Information Analytics

GraphLab API

GraphLab API

We have approached GraphLab from different angles:

Function wise, the Python binding is equivalent to the toolkits. Namely, you can only use existing algorithms programmed by other people. In this tutorial, we take a deeper look into GraphLab.

Workflow of programming towards GraphLab C++ API

Get the core package and compiler toolchain.

git clone https://github.com/graphlab-code/graphlab.git
sudo apt-get install gcc g++ build-essential libopenmpi-dev default-jdk cmake zlib1g-dev

Make a new dir under demoapps, e.g. demoapps/engg4030. Create a file called CMakeLists.txt with following content:

project(GraphLab)
add_graphlab_executable(hello hello.cpp)

Create the hello world example hello.cpp:

#include <graphlab.hpp>
#include <iostream>
int main(int argc, char** argv) {
  graphlab::mpi_tools::init(argc, argv);
  graphlab::distributed_control dc;
  dc.cout() << "Hello World! (From distributed control)\n";
  std::cout << "Output per core! (From every core)" << std::endl;
  graphlab::mpi_tools::finalize();
}

You can create the project anywhere. Just for convenience, we put it under demoapps dir, so that configure script can find it directly.

The configure script will test your environment and generate proper compiler commands for /release and /debug:

./configure

/release and /debug dir structure mirrors that of the project root. You can find corresponding Makefile for debug/release binaries.

$ls release/demoapps/engg4030/
CMakeFiles  cmake_install.cmake  CTestTestfile.cmake  Makefile
$ls debug/demoapps/engg4030/
CMakeFiles  cmake_install.cmake  CTestTestfile.cmake  Makefile

Note, the /release/* or /debug/* does not contain codes. They only contain build scripts and your codes are still in original position.

Compile the codes as follows:

cd debug/demoapps/engg4030/
make

Note that you can run more concurrent jobs for make, e.g. make -j8. As a usual rule, let the concurrency less than number of CPUs. As advised by GraphLab, make the concurrency less than Memory(G) / 1G.

Summary of the workflow after preparation:

The Examples Repo

You can find all files used in this tutorial here.

Get via Git: (note, not the GraphLab's official repo)

git clone https://github.com/hupili/graphlab.git

I made a Makefile to wrap the above workflow. So you can modify and test the codes under demoapps/engg4030 directly. After modification, just run make under the same folder. The built executables will be put under output/.

Here is a glimpse of the source codes:

GraphLab GAS Model

One common pattern for graph algorithms, and many machine learning algorithms: Gather-Apply-Scatter (GAS).

Lifecycle of a Vertex Program:

Examples

Hello World

source

Key take-aways:

Remote-Procedure-Call (RPC)

source

Key take-aways:

PageRank from demoapps

source

Same as demoapps/simple_pagerank_annotated.cpp, with more annotations.

Key take-aways:

PageRank Base

source

This is same above PageRank, except for:

This is used as the base code for several alternative PageRank implementations, so that you can get a quick understanding via git diff.

PageRank: Scatter

diff

full source

Key take-away:

PageRank: Fixed iteration via vertex scheduling

diff

full source

Key take-away:

PageRank: Fixed iteration via engine scheduling

diff

full source

Key take-aways:

PageRank: Simulating random walker

diff

full source

Key take-aways:

This is interesting as a learning example for the expressiveness of GraphLab. However, there are some caveats of the implementation:

Limitations of GAS:

Exercise

Implement distributed Bellman-Ford (the one used in RIP) using GraphLab's GAS model.

Reference

Outcome of This Tutorial

comments powered by Disqus
▶ Back ▲ Top