Ref from "NIPS 20 Digraph Inception Convolutional Networks
Transform digraphs to undirected by relaxing its direction structure
T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
We assign separate relation nodes r1 and r2 for each entity pair (e1,r,e2) as (e1,r1) and (e2,r2). Entity nodes are described by sparse feature vectors. We extend the number of features in NELL by assigning a unique one-hot representation for every relation node, effectively resulting in a 61,278-dim sparse feature vector per node. The semi-supervised task here considers the extreme case of only a single labeled example per class in the training set. We construct a binary, symmetric adjacency matrix from this graph by setting entries Aij = 1, if one or more edges are present between nodes i and j.
F. Wu, T. Zhang, A. H. d. Souza Jr, C. Fifty, T. Yu, and K. Q. Weinberger, “Simplifying graph convolutional networks,” arXiv preprint arXiv:1902.07153, 2019. ICML
A ∈ R n×n is a symmetric (typically sparse) adjacency matrix
Y. Wang, W. Wang, Y. Liang, Y. Cai, J. Liu, and B. Hooi, “Nodeaug: Semi-supervised node classification with data augmentation,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 207–217.
Each of them contains an unweighted adjacency matrix and bag-of-words features.
Looking for structural patterns and reformulate the graph.
learn specific structure by defining motifs [28], inheritance relationship [18] and second-order proximity[39]. (have to stipulate learning templates or rules in advance, and is not capable to deal with complex structures beyond their definitions.)
use two adjacency matrices: Aa ∈ R N×N denotes the connections from nodes to their ancestors, whereas Ad denotes the connections from nodes to their descendants.
Only apply for strongly connected digraph according to their definition
Y. Ma, J. Hao, Y. Yang, H. Li, J. Jin, and G. Chen, “Spectral-based graph convolutional network for directed graphs,” arXiv preprint arXiv:1907.08990, 2019.
(transition matrix P of an ergodic random walk X)
Normalized directed graph Laplacian
Random walk Laplacian
Combinatorial Laplacian
GSO
diagonalizable
$S = Vdiag(λ)V−1$ , with $V := [v_1, …, v_N ]$ denoting the (non-orthogonal) eigenvectors of S and $λ := [λ_1, …, λ_N ]^T$ its possibly complex-valued eigenvalues.
see also [11] for a careful treatment of the non-diagonalizable case which relies on oblique spectral projectors to define the GFT
GFT approaches rely on projections onto the (non-orthogonal) eigenvectors of a judicious random walk operator on the digraph [9], [10]
Property:
$$
TV_1(x) := ||x - \hat{S}x||_1
$$
frequency ordering: $λi ≻ λj$ if $TV_1(v_i) > TV_1(v_j)$
$S = S/|λ_{max}|$ and $λ_{max}$ is the spectral radius of S
does not ensure that constant signals have zero variation
generalized eigenvectors of asymmetric GSOs need not be orthonormal, implying that Parseval’s identity will nothold and hence the signal power is notpreserved across the vertex and dual domains
obtaining the Jordan decomposition → expensive, numerically unstable
uniqueness of the representation (GSO has repeated eigenvalues)
Pubmed
nodes are documents and edges are citation links.
The datasets contain sparse bag-of-words feature vectors for each document and a list of citation links between documents. We treat the citation links as (undirected) edges and construct a binary, symmetric adjacency matrix A. Each document has a class label. For training, we only use 20 labels per class, but all feature vectors.
AM-COMPUTER
where nodes represent goods, while edges represent two kinds of goods that are often purchased together.
Knowledge Graph
NELL
(Carlson et al., 2010; Yang et al., 2016) is a bipartite graph dataset extracted from a knowledge graph with 55,864 relation nodes and 9,891 entity nodes.
NELL is a dataset extracted from the knowledge graph introduced in (Carlson et al., 2010). A knowledge graph is a set of entities connected with directed, labeled edges (relations). We follow the pre-processing scheme as described in Yang et al. (2016). We assign separate relation nodes r1 and r2 for each entity pair (e1,r,e2) as (e1,r1) and (e2,r2). Entity nodes are described by sparse feature vectors. We extend the number of features in NELL by assigning a unique one-hot representation for every relation node, effectively resulting in a 61,278-dim sparse feature vector per node. The semi-supervised task here considers the extreme case of only a single labeled example per class in the training set. We construct a binary, symmetric adjacency matrix from this graph by setting entries Aij = 1, if one or more edges are present between nodes i and j.
A directed network of hyperlinks among a large set of U.S. political weblogs from before the 2004 election [2]. It includes blog political affiliation as metadata. Links between blogs were automatically extracted from a crawl of the front page of the blog. In addition, the authors drew on various sources (blog directories, and incoming and outgoing links and posts around the time of the 2004 presidential election) and classified 758 blogs as left-leaning and the remaining 732 as right-leaning. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.653.5324&rep=rep1&type=pdf
Wikipedia
The hyperlink network of Wikipedia pages on editorial norms [9], in 2015. Nodes are Wikipedia entries, and two entries are linked by a directed edge if one hyperlinks to the other. Editorial norms cover content creation, interactions between users, and formal administrative structure among users and admins. Metadata includes page information such as creation date, number of edits, page views and so on. The number of norm categories is also given.
Email
The network was generated using email data from a large European research institution [14]. We have anonymized information about all incoming and outgoing email between members of the research institution. There is an edge (u, v) in the network if person u sent person v at least one email. The emails only represent communication between institution members. The dataset also contains ground-truth community memberships of the nodes. Each individual belongs to exactly one of 42 departments at the research institute. http://snap.stanford.edu/data/#email