Importing Graph Data from Files

You can create a graph from files using the instructions below. This document covers:

  • Functions for loading graphs from files
  • How to enable those functions
  • CSV file structure expected by the loading functions
  • A simple source-code example for loading countries and cities

Users load graphs in two steps:

  • First, load vertices
  • Second, load edges

You must create the graph and its labels before loading data from files.

Loading Functions

The following functions load vertices and edges from files.

The function load_labels_from_file loads vertices from a CSV file.

load_labels_from_file('<graph name>',
                      '<label name>',
                      '<file path>')

A fourth argument lets users omit the id field. Use this option only when the input file contains no id column.

load_labels_from_file('<graph name>',
                      '<label name>',
                      '<file path>',
                      false)

The function load_edges_from_file loads edges from a CSV file. See the file structure specification below.

Note: Ensure that id values in the edge file match existing id values in the vertex file.

load_edges_from_file('<graph name>',
                    '<label name>',
                    '<file path>');

CSV File Format Specification

The following describes the required structure for vertex and edge CSV files.

  • A vertex CSV file must have the following format:
Column Name Description
id Must be the first column. All values must be positive integers. Optional only when id_field_exists is set to false. Otherwise, it must be present.
Properties All remaining columns define vertex properties. The header row must contain property names.
  • An edge CSV file must have the following format:
Column Name Description
start_id The id of the source vertex. Must exist in the vertex file.
start_vertex_type The label of the source vertex.
end_id The id of the target vertex.
end_vertex_type The label of the target vertex.
properties Edge properties. Header row must contain property names.

Example files are located in regress/age_load/data.

SQL Script Examples

  • Load the AGE extension and create a graph:
LOAD 'age';

SET search_path TO ag_catalog;
SELECT create_graph('agload_test_graph');
  • Create the Country label and load vertices from a CSV file. Note: This CSV file includes an id column.
SELECT create_vlabel('agload_test_graph','Country');
SELECT load_labels_from_file('agload_test_graph',
                             'Country',
                             'age/regress/age_load/data/countries.csv');
  • Create the City label and load vertices from a CSV file. Note: This CSV file includes an id column.
SELECT create_vlabel('agload_test_graph','City');
SELECT load_labels_from_file('agload_test_graph',
                             'City',
                             'age/regress/age_load/data/cities.csv');
  • Create the has_city edge label and load edges from a CSV file.
SELECT create_elabel('agload_test_graph','has_city');
SELECT load_edges_from_file('agload_test_graph', 'has_city',
     'age/regress/age_load/data/edges.csv');
  • Verify correct graph loading:
SELECT table_catalog, table_schema, table_name, table_type
FROM information_schema.tables
WHERE table_schema = 'agload_test_graph';

SELECT COUNT(*) FROM agload_test_graph."Country";
SELECT COUNT(*) FROM agload_test_graph."City";
SELECT COUNT(*) FROM agload_test_graph."has_city";

SELECT COUNT(*) FROM cypher('agload_test_graph', $$MATCH(n) RETURN n$$) AS (n agtype);
SELECT COUNT(*) FROM cypher('agload_test_graph', $$MATCH (a)-[e]->(b) RETURN e$$) AS (n agtype);

Creating Vertices When the Input File Has No id Column

  • Create the Country2 label and load vertices from a CSV file. Note: This CSV file has no id column.
SELECT create_vlabel('agload_test_graph','Country2');
SELECT load_labels_from_file('agload_test_graph',
                             'Country2',
                             'age/regress/age_load/data/countries.csv',
                             false);
  • Create the City2 label and load vertices from a CSV file. Note: This CSV file has no id column.
SELECT create_vlabel('agload_test_graph','City2');
SELECT load_labels_from_file('agload_test_graph',
                             'City2',
                             'age/regress/age_load/data/cities.csv',
                             false);
  • Verify correct graph loading and compare auto-generated IDs versus IDs read from the file:
    • Labels Country and City use id values from the input file.
    • Labels Country2 and City2 were created without an id column — IDs were auto-assigned.
SELECT COUNT(*) FROM agload_test_graph."Country2";
SELECT COUNT(*) FROM agload_test_graph."City2";

SELECT id FROM agload_test_graph."Country" LIMIT 10;
SELECT id FROM agload_test_graph."Country2" LIMIT 10;

SELECT * FROM cypher('agload_test_graph', $$MATCH(n:Country {iso2 : 'BE'})
    RETURN id(n), n.name, n.iso2 $$) AS ("id(n)" agtype, "n.name" agtype, "n.iso2" agtype);
SELECT * FROM cypher('agload_test_graph', $$MATCH(n:Country2 {iso2 : 'BE'})
    RETURN id(n), n.name, n.iso2 $$) AS ("id(n)" agtype, "n.name" agtype, "n.iso2" agtype);

SELECT * FROM cypher('agload_test_graph', $$MATCH(n:Country {iso2 : 'AT'})
    RETURN id(n), n.name, n.iso2 $$) AS ("id(n)" agtype, "n.name" agtype, "n.iso2" agtype);
SELECT * FROM cypher('agload_test_graph', $$MATCH(n:Country2 {iso2 : 'AT'})
    RETURN id(n), n.name, n.iso2 $$) AS ("id(n)" agtype, "n.name" agtype, "n.iso2" agtype);

SELECT drop_graph('agload_test_graph', true);