- DAG-GNN: DAG Structure Learning with Graph Neural Networks
- DAGs with NO TEARS: Continuous Optimization for Structure Learning
- Causal Structure Discovery from Distributions Arising from Mixtures of DAGs
- Differentiable Causal Discovery from Interventional Data
- Learning Neural Causal Models from Unknown Interventions
- D-VAE: A Variational Autoencoder for Directed Acyclic Graphs

Zheng, Xun, et al. “DAGs with NO TEARS: Continuous optimization for structure learning.” Advances in Neural Information Processing Systems. 2018.

Yu, Y., Chen, J., Gao, T., & Yu, M. (2019). Dag-gnn: Dag structure learning with graph neural networks. arXiv preprint arXiv:1904.10098.

Ke, N. R., Bilaniuk, O., Goyal, A., Bauer, S., Larochelle, H., Schölkopf, B., … & Bengio, Y. (2019). Learning neural causal models from unknown interventions. arXiv preprint arXiv:1910.01075.

Zheng, X., Dan, C., Aragam, B., Ravikumar, P., & Xing, E. (2020, June). Learning sparse nonparametric DAGs. In International Conference on Artificial Intelligence and Statistics (pp. 3414-3425). PMLR.

Moraffah, R., Moraffah, B., Karami, M., Raglin, A., & Liu, H. (2020). Can: A causal adversarial network for learning observational and interventional distributions. arXiv preprint arXiv:2008.11376.

Wei, D., Gao, T., & Yu, Y. (2020). DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks. arXiv preprint arXiv:2010.09133.

Huang, X., Zhu, F., Holloway, L., & Haidar, A. (2020). Causal Discovery from Incomplete Data using An Encoder and Reinforcement Learning. arXiv preprint arXiv:2006.05554.

Zhang, M., Jiang, S., Cui, Z., Garnett, R., & Chen, Y. (2019). D-vae: A variational autoencoder for directed acyclic graphs. In Advances in Neural Information Processing Systems (pp. 1588-1600).

Andrews, B., Ramsey, J., & Cooper, G. F. (2019). Learning high-dimensional directed acyclic graphs with mixed data-types. Proceedings of machine learning research, 104, 4.

Ferguson, K. D., McCann, M., Katikireddi, S. V., Thomson, H., Green, M. J., Smith, D. J., & Lewsey, J. D. (2020). Evidence synthesis for constructing directed acyclic graphs (ESC-DAGs): a novel and systematic method for building directed acyclic graphs. International journal of epidemiology, 49(1), 322-329.

Aragam, B., Amini, A., & Zhou, Q. (2019). Globally optimal score-based learning of directed acyclic graphs in high-dimensions. In Advances in Neural Information Processing Systems (pp. 4450-4462).

D’Arcy, L., Corcoran, P., & Preece, A. (2019). Deep Q-Learning for Directed Acyclic Graph Generation. arXiv preprint arXiv:1906.02280.

**Causal Discovery with Reinforcement Learning**

Shengyu Zhu, Ignavier Ng, Zhitang Chen

Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are usually less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint.

**Neural Additive Vector Autoregression Models for Causal Discovery in Time Series Data**

Bart Bussmann, Jannes Nys, Steven Latré

Causal structure discovery in complex dynamical systems is an important challenge for many scientific domains. Although data from (interventional) experiments is usually limited, large amounts of observational time series data sets are usually available. Current methods that learn causal structure from time series often assume linear relationships. Hence, they may fail in realistic settings that contain nonlinear relations between the variables. We propose Neural Additive Vector Autoregression (NAVAR) models, a neural approach to causal structure learning that can discover nonlinear relationships. We train deep neural networks that extract the (additive) Granger causal influences from the time evolution in multi-variate time series. The method achieves state-of-the-art results on various benchmark data sets for causal discovery, while providing clear interpretations of the mapped causal relations.

Stattner, E., & Vidot, N. (2011, May). Social network analysis in epidemiology: Current trends and perspectives. In 2011 Fifth International Conference on Research Challenges in Information Science (pp. 1-11). IEEE.

R. Gupta, S. Gupta, R. Gupta and N. Sardana, “DPSN: A Novel approach for Disease Prediction based on Social Networks,” 2019 International Conference on Signal Processing and Communication (ICSC), NOIDA, India, 2019, pp. 180-185, doi: 10.1109/ICSC45622.2019.8938391.

Chen YD., Tseng C., King CC., Wu TS.J., Chen H. (2007) Incorporating Geographical Contacts into Social Network Analysis for Contact Tracing in Epidemiology: A Study on Taiwan SARS Data. In: Zeng D. et al. (eds) Intelligence and Security Informatics: Biosurveillance. BioSurveillance 2007. Lecture Notes in Computer Science, vol 4506. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72608-1_3

# Directed Graph

**The integrated approach of learning tuberculosis transmission within and outside households via random directed graph models**

Tenglong Li, Edward C. Jones-Lope

Abstract

Household contact studies are frequently used in tuberculosis transmission research, and models based on them often focus on transmission within the household. This contradicts recent research which suggests the transmission may be more likely to happen outside the household than within the household in high burden settings where these studies are frequently conducted. Consequently, most models would lead to biased estimates and misleading public health interventions. There is a strong need for developing models that allow concurrent estimation of household and extra-household transmission. In this study, we develop a random directed graph model for tuberculosis transmission, which permits users to concurrently build models for both household and extra-household transmission. Furthermore, our model can estimate the relative frequency of household transmission versus extra-household transmission and consistently produce unbiased estimates for risk factors, regardless of whether community controls are available. We illustrate our approach with a household contact study conducted in Vitoria, Brazil, and our results indicate that extra-household transmission can account for 63% to 98% of M. tuberculosis infections detected during such a study.

(preprint)

**Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric Transitivity Preserving Graph Embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16).**

ABSTRACT

Graph embedding algorithms embed a graph into a vector space where the structure and the inherent properties of the graph are preserved. The existing graph embedding methods cannot preserve the asymmetric transitivity well, which is a critical property of directed graphs. Asymmetric transitivity depicts the correlation among directed edges, that is, if there is a directed path from u to v, then there is likely a directed edge from u to v. Asymmetric transitivity can help in capturing structures of graphs and recovering from partially observed graphs. To tackle this challenge, we propose the idea of preserving asymmetric transitivity by approximating high-order proximity which are based on asymmetric transitivity. In particular, we develop a novel graph embedding algorithm, High-Order Proximity preserved Embedding (HOPE for short), which is scalable to preserve high-order proximities of large scale graphs and capable of capturing the asymmetric transitivity. More specifically, we first derive a general formulation that cover multiple popular high-order proximity measurements, then propose a scalable embedding algorithm to approximate the high-order proximity measurements based on their general formulation. Moreover, we provide a theoretical upper bound on the RMSE (Root Mean Squared Error) of the approximation. Our empirical experiments on a synthetic dataset and three real-world datasets demonstrate that HOPE can approximate the high-order proximities significantly better than the state-of-art algorithms and outperform the state-of-art algorithms in tasks of reconstruction, link prediction and vertex recommendation.

**Adversarial Directed Graph Embedding**

Shijie Zhu, Jianxin Li, Hao Peng, Senzhang Wang, Philip S. Yu, Lifang He

Node representation learning for directed graphs is critically important to facilitate many graph mining tasks. To capture the directed edges between nodes, existing methods mostly learn two embedding vectors for each node, source vector and target vector. However, these methods learn the source and target vectors separately. For the node with very low indegree or outdegree, the corresponding target vector or source vector cannot be effectively learned. In this paper, we propose a novel Directed Graph embedding framework based on Generative Adversarial Network, called DGGAN. The main idea is to use adversarial mechanisms to deploy a discriminator and two generators that jointly learn each node’s source and target vectors. For a given node, the two generators are trained to generate its fake target and source neighbor nodes from the same underlying distribution, and the discriminator aims to distinguish whether a neighbor node is real or fake. The two generators are formulated into a unified framework and could mutually reinforce each other to learn more robust source and target vectors. Extensive experiments show that DGGAN consistently and significantly outperforms existing state-of-the-art methods across multiple graph mining tasks on directed graphs.

**Sun, J., Bandyopadhyay, B., Bashizade, A., Liang, J., Sadayappan, P., & Parthasarathy, S. (2019, July). Atp: Directed graph embedding with asymmetric transitivity preservation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 265-272).**

Abstract

Directed graphs have been widely used in Community Question Answering services (CQAs) to model asymmetric relationships among different types of nodes in CQA graphs, e.g., question, answer, user. Asymmetric transitivity is an essential property of directed graphs, since it can play an important role in downstream graph inference and analysis. Question difficulty and user expertise follow the characteristic of asymmetric transitivity. Maintaining such properties, while reducing the graph to a lower dimensional vector embedding space, has been the focus of much recent research. In this paper, we tackle the challenge of directed graph embedding with asymmetric transitivity preservation and then leverage the proposed embedding method to solve a fundamental task in CQAs: how to appropriately route and assign newly posted questions to users with the suitable expertise and interest in CQAs. The technique incorporates graph hierarchy and reachability information naturally by relying on a nonlinear transformation that operates on the core reachability and implicit hierarchy within such graphs. Subsequently, the methodology levers a factorization-based approach to generate two embedding vectors for each node within the graph, to capture the asymmetric transitivity. Extensive experiments show that our framework consistently and significantly outperforms the state-of-the-art baselines on three diverse realworld tasks: link prediction, and question difficulty estimation and expert finding in online forums like Stack Exchange. Particularly, our framework can support inductive embedding learning for newly posted questions (unseen nodes during training), and therefore can properly route and assign these kinds of questions to experts in CQAs.

**C. Li, X. Qin, X. Xu, D. Yang and G. Wei, “Scalable Graph Convolutional Networks With Fast Localized Spectral Filter for Directed Graphs,” in IEEE Access, vol. 8, pp. 105634-105644, 2020, doi: 10.1109/ACCESS.2020.2999520.**

Abstract:

Graph convolutional neural netwoks (GCNNs) have been emerged to handle graph-structured data in recent years. Most existing GCNNs are either spatial approaches working on neighborhood of each node, or spectral approaches based on graph Laplacian. Compared with spatial-based GCNNs, spectral-based GCNNs are capable of highly exploiting graph structure information, but always regard graphs undiredcted. Actually, there are many scenarios where the graph structures are directed, such as social networks, citation networks, etc. Treating graphs undirected may lose important information, which is helpful for graph learning tasks. This motivate us to construct a spectral-based GCNN for directed graphs. In this paper, we propose a scalable graph convolutional neural network with fast localized convolution operators derived from directed graph Laplacian, which is called fast directed graph convolutional network (FDGCN). FDGCN can directly work on directed graphs and can scale to large graphs as the convolution operation is linear with the number of edges. Furthermore, we find that FDGCN can unify the graph convolutional network (GCN), which is a classic spectral-based GCNN. The mechanism of FDGCN is thoroughly analyzed from spatial aggregation point of view. Since previous work has confirmed that considering uncertainty of graph could promote GCN a lot, the proposed FDGCN is further enhanced through extra training epochs on random graphs generated by mixed membership stochastic block model (MMSBM). Experiments are conducted for semi-supervised node classification tasks to evaluate the performance of FDGCN. Results show that our model can outperform or match state-of-the-art models in most cases.

**On Learning a Hidden Directed Graph with Path Queries**

Mano Vikash Janardhanan, Lev Reyzin

In this paper, we consider the problem of reconstructing a directed graph using path queries. In this query model of learning, a graph is hidden from the learner, and the learner can access information about it with path queries. For a source and destination node, a path query returns whether there is a directed path from the source to the destination node in the hidden graph. In this paper we first give bounds for learning graphs on n vertices and k strongly connected components. We then study the case of bounded degree directed trees and give new algorithms for learning “almost-trees” – directed trees to which extra edges have been added. We also give some lower bound constructions justifying our approach.

**Drobyshevskiy M., Korshunov A., Turdakov D. (2017) Learning and Scaling Directed Networks via Graph Embedding.**

Abstract

Reliable evaluation of network mining tools implies significance and scalability testing. This is usually achieved by picking several graphs of various size from different domains. However, graph properties and thus evaluation results could be dramatically different from one domain to another. Hence the necessity of aggregating results over a multitude of graphs within each domain.

The paper introduces an approach to automatically learn features of a directed graph from any domain and generate similar graphs while scaling input graph size with a real-valued factor. Generating multiple graphs with similar size allows significance testing, while scaling graph size makes scalability evaluation possible. The proposed method relies on embedding an input graph into low-dimensional space, thus encoding graph features in a set of node vectors. Edge weights and node communities could be imitated as well in optional steps.

We demonstrate that embedding-based approach ensures variability of synthetic graphs while keeping degree and subgraphs distributions close to the original graphs. Therefore, the method could make significance and scalability testing of network algorithms more reliable without the need to collect additional data. We also show that embedding-based approach preserves various features in generated graphs which can’t be achieved by other generators imitating a given graph.

**Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2005. Learning from labeled and unlabeled data on a directed graph. ICML**

ABSTRACT

We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to real-world web classification problems and obtained encouraging results.

**Directed Graph Convolutional Network**

Zekun Tong, Yuxuan Liang, Changsheng Sun, David S. Rosenblum, Andrew Lim

Graph Convolutional Networks (GCNs) have been widely used due to their outstanding performance in processing graph-structured data. However, the undirected graphs limit their application scope. In this paper, we extend spectral-based graph convolution to directed graphs by using first- and second-order proximity, which can not only retain the connection properties of the directed graph, but also expand the receptive field of the convolution operation. A new GCN model, called DGCN, is then designed to learn representations on the directed graph, leveraging both the first- and second-order proximity information. We empirically show the fact that GCNs working only with DGCNs can encode more useful information from graph and help achieve better performance when generalized to other models. Moreover, extensive experiments on citation networks and co-purchase datasets demonstrate the superiority of our model against the state-of-the-art methods.

**Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation**

Jiaxuan You, Bowen Liu, Zhitao Ying, Vijay Pande, Jure Leskovec

Abstract

Generating novel graph structures that optimize given objectives while obeying some given underlying rules is fundamental for chemistry, biology and social science research. This is especially important in the task of molecular graph generation, whose goal is to discover novel molecules with desired properties such as drug-likeness and synthetic accessibility, while obeying physical laws such as chemical valency. However, designing models that finds molecules that optimize desired properties while incorporating highly complex and non-differentiable rules remains to be a challenging task. Here we propose Graph Convolutional Policy Network (GCPN), a general graph convolutional network based model for goal-directed graph generation through reinforcement learning. The model is trained to optimize domain-specific rewards and adversarial loss through policy gradient, and acts in an environment that incorporates domain-specific rules. Experimental results show that GCPN can achieve 61% improvement on chemical property optimization over state-of-the-art baselines while resembling known molecules, and achieve 184% improvement on the constrained property optimization task.

**DIRECTED ACYCLIC GRAPH NEURAL NETWORKS**

Veronika Thost & Jie Chen

Graph-structured data ubiquitously appears in science and engineering. Graph

neural networks (GNNs) are designed to exploit the relational inductive bias exhibited in graphs; they have been shown to outperform other forms of neural

networks in scenarios where structure information supplements node features.

The most common GNN architecture aggregates information from neighborhoods

based on message passing. Its generality has made it broadly applicable. In this

paper, we focus on a special, yet widely used, type of graphs—DAGs—and inject

a stronger inductive bias—partial ordering—into the neural network design. We

propose the directed acyclic graph neural network, DAGNN, an architecture that

processes information according to the flow defined by the partial order. DAGNN

can be considered a framework that entails earlier works as special cases (e.g.,

models for trees and models updating node representations recurrently), but we

identify several crucial components that prior architectures lack. We perform

comprehensive experiments, including ablation studies, on representative DAG

datasets (i.e., source code, neural architectures, and probabilistic graphical models) and demonstrate the superiority of DAGNN over simpler DAG architectures

as well as general graph architectures.