GCNs

## Iterative Classification Algorithm (ICA) ### Two-stage Process 1. **Local Classifier**: - Train on labeled nodes using local features only - Bootstrap unlabeled nodes 2. **Relational Classifier**: - Use local features + aggregation operator - 10 iterations with random node ordering - Hyperparameters chosen via validation **Note**: TSVM omitted due to scalability issues with large class numbers

---

### Renormalization Trick Superior - **Best overall performance** across all datasets - Balances efficiency and representation power ### Graph Structure Matters - **MLP baseline** performs significantly worse - Confirms importance of graph convolution operations ### Simpler Can Be Better - **Renormalization trick** outperforms complex Chebyshev polynomials - **Fewer parameters** → better generalization - **Lower computational cost** → practical advantages

--- # Appendix --- <!-- _header: Related Work

## Graph-Based Semi-Supervised Learning - **Traditional**: Graph Laplacian regularization, graph embedding (DeepWalk, etc.). Multi-step pipelines were a limitation. - **Recent**: Planetoid injects label info during embedding. ## Neural Networks on Graphs - **Early Work**: Graph Neural Networks (Gori et al., 2005) - **Convolution-Based**: - Spectral Methods (Bruna et al., 2014): O($N^2$) complexity. - Localized Convolutions (Defferrard et al., 2016): Fast Chebyshev approximation. - Degree-Specific Weights (Duvenaud et al., 2015): Scalability issues for wide degree distributions.

GCNs

Semi Supervised Classification with Graph Convolutional Networks

Basic Elements of a Graph

Graph Spectral Theory

Graph Laplacian

Normalized Laplacian

Graph Fourier Transform

Spectral Convolution

Practical Filtering

Summary

Loss Functions of GNN

Regularization Term

Homophily Hypothesis

Proposed Methods

Where:

Computational Cost Issues

Solution: Approximation via Chebyshev Polynomials

Components of Eq. (4) and Chebyshev Polynomials

Convolution using the Approximation

Important Properties:

Approximation and Simplification

Renormalization Trick

Generalization to Multiple Channels/Filters

Two-Layer GCN

Loss Function

Datasets Overview

Citeseer, Cora, and Pubmed (Citation networks)

Knowledge Graph Structure (NELL)

For Runtime Analysis (Random graphs)

Model Configuration

Semi-Supervised Node Classification Results

Propagation Model Evaluation

Comparing Different Variants

Training Time Analysis

Key Finding

Why GCN Outperforms Traditional Methods

1. End-to-End Learning

2. Efficient Information Propagation

3. Computational Efficiency

Limitations and Future Work

Memory Requirements

Directed edges and edge features

Current Limiting Assumptions