Michael Pidde => { Programmer(); }

Data Structures - Linked List

Since I don't hold a degree in computer science (rather, I studied software development at a technical college) there are certain elements of computing that I don't have formal training in. It's not well understood to outsiders of the professions, but computer science and software development are fairly distinctive skill sets with differing end goals. Skills certainly overlap, and software development as a whole would not be here without the groundwork of computer science. However, not all disciplines or concepts studied in a formal computer science major are necessary or even considered good practice in software development. Knowing what an array is and how to use it is different from building an array data structure from scratch to use in your project. Picking between sorting algorithms that are baked into your chosen language's array implementation is different than writing your own custom sorting algorithm.

There seem to be two different camps of (software) programmers: the Don't reinvent the wheel - you can't do it better than the collective camp and the I didn't write it so it's too heavy and has more than I'll ever need - it's too slow/disorganized/cryptic/simplistic camp. I don't necessarily feel the need to join either side of that rather arbitrary dichotomy; different approaches work for different end goals. Whatever the case, I won't be writing my own array or linked list data structures from scratch for any real project. I do agree with the first camp in the regard that you're probably not going to rewrite what your language already implements better than it does. It will be fraught with peril at best.

That said, I'm still incredibly interested in knowing how these data structures work on a fundamental level. As I've been using C/C++ more often and thus have direct access to memory (for good or ill), I've thought that I can kill two birds with one stone and solidify the concepts of memory allocation and usage of pointers as well as gain some understanding of the internals of data structures while I'm at it. While I think it's useful and generally good that higher level languages used in application programming abstract away the need to screw with manual memory allocation, the idea doesn't need to be a deep, dark hole that we need to fear. Some rudimentary good practices alleviate the causes of most memory leaks, but if we look at real world examples of libraries written in memory-unmanaged languages, we'll see memory leaks all over the place. Spin up a simple SDL2 program and kick it off in gdb and try to plug all of the holes in that boat.

Here's my first stab at a linked list implementation using C++. I wrote some unit tests on top of the googletest framework for good measure.

// linkedlist.h
#ifndef __LINKEDLIST_H
#define __LINKEDLIST_H

namespace datatypes {

    struct node {
        int data;
        node *next;
    };

    class list {
        private:
            node *head, *tail;
        public:
            list();
            void append(int value);
            void prepend(int value);
            void insert_at(int value, unsigned int pos);
            
            void delete_head();
            void delete_tail();
            void delete_at(unsigned int pos);

            int value_at(unsigned int pos);

            int count();

            void show();
    };

}

#endif
// linkedlist.cpp
#include "linkedlist.h"
#include <iostream>

namespace datatypes {

list::list() {
    head = NULL;
    tail = NULL;
}

void list::append(int value) {
    node *n = new node;
    n->data = value;
    n->next = NULL;
    if(head == NULL) {
        head = n;
        tail = n;
    } else {
        tail->next = n;
        tail = n;
    }
}

void list::prepend(int value) {
    node *n = new node;
    n->data = value;
    n->next = head;
    head = n;
}

void list::insert_at(int value, unsigned int pos) {
    node *n = new node;
    n->data = value;

    if(count() == 0 && pos == 0) {
        // Empty list, inserting at index 0
        n->next = NULL;
        head = n;
        tail = n;
    } else {
        node *pre = NULL;
        node *at = head;

        for(int i = 0; i < pos; ++i) {
            pre = at;
            at = at->next;
        }

        if(pre == NULL) {
            // Non-empty list, inserting an index 0
            n->next = at;
            head = n;
        } else {
            n->next = at;
            pre->next = n;
        } 
    }
}

void list::delete_head() {
    node *n = head;
    head = head->next;
    delete n;
}

void list::delete_tail() {
    node *prev = NULL;
    node *at = head;

    while(at->next != NULL) {
        prev = at;
        at = at->next;
    }

    tail = prev;
    prev->next = NULL;
    delete at;
}

void list::delete_at(unsigned int pos) {
    node *prev = NULL;
    node *at = head;

    for(int i = 0; i < pos; ++i) {
        prev = at;
        at = at->next;
    }

    prev->next = at->next;
    delete at;
}

int list::value_at(unsigned int pos) {
    node *n = head;
    for(int i = 0; i < pos; ++i)
        n = n->next;
    return n->data;
}

int list::count() {
    int ctr = 0;
    node *n = head;
    while(n != NULL) {
        ++ctr;
        n = n->next;
    }
    return ctr;
}

void list::show() {
    node *tmp = head;
    std::cout << "\n---\n";
    while(tmp != NULL) {
        std::cout << tmp->data << "\n";
        tmp = tmp->next;
    }
    std::cout << "---\n";
}

}

And some sample usage:

datatypes::list l;
l.append(1);
l.append(10);
l.prepend(20);
l.insert_at(0, 30);
l.delete_head();
std::cout << std::to_string(l.count());
l.show();

/*
Yields:

3
---
20
1
10
---
*/

The unit tests for posterity:

//linkedlist_tests.cpp
#include "../linkedlist.h"
#include "gtest/gtest.h"

namespace {

/*
 * Insertions
 */
TEST(LinkedListTest, AppendToEmptyList) {
    datatypes::list l;
    l.append(10);
    ASSERT_EQ(l.value_at(0), 10);
}

TEST(LinkedListTest, ValueAtIndex0InNonEmptyList) {
    datatypes::list l;
    l.append(10);
    ASSERT_EQ(l.value_at(0), 10);
}

TEST(LinkedListTest, CannotGetValueAtNegativeIndex) {
    datatypes::list l;
    l.append(10);
    ASSERT_EXIT((l.value_at(-1), exit(0)), ::testing::KilledBySignal(SIGSEGV), ".*");
}

TEST(LinkedListTest, CannotGetValueAtIndexOutOfRange) {
    datatypes::list l;
    l.append(10);
    ASSERT_EXIT((l.value_at(100), exit(0)), ::testing::KilledBySignal(SIGSEGV), ".*");
}

TEST(LinkedListTest, CountWorks) {
    datatypes::list l;
    l.append(10);
    ASSERT_EQ(l.count(), 1);
}

TEST(LinkedListTest, AppendToNonEmptyList) {
    datatypes::list l;
    l.append(10);
    l.append(-20);
    ASSERT_EQ(l.count(), 2);
}

TEST(LinkedListTest, PrependToEmptyList) {
    datatypes::list l;
    l.prepend(10);
    ASSERT_EQ(l.count(), 1);
}

TEST(LinkedListTest, InsertIntoEmptyListAtIndex0) {
    datatypes::list l;
    l.insert_at(10, 0);
    ASSERT_EQ(l.count(), 1);
}

TEST(LinkedListTest, CannotInsertIntoEmptyListAtIndex1) {
    datatypes::list l;
    ASSERT_EXIT((l.insert_at(10, 1), exit(0)), ::testing::KilledBySignal(SIGSEGV), ".*");
}

TEST(LinkedListTest, CannotInsertAtNegativeIndex) {
    datatypes::list l;
    ASSERT_EXIT((l.insert_at(10, -1), exit(0)), ::testing::KilledBySignal(SIGSEGV), ".*");
}

TEST(LinkedListTest, InsertIntoNonEmptyListAtIndex0) {
    datatypes::list l;
    l.append(100);
    l.insert_at(500, 0);
    ASSERT_EQ(l.count(), 2);
    ASSERT_EQ(l.value_at(0), 500);
    ASSERT_EQ(l.value_at(1), 100);
}

TEST(LinkedListTest, InsertIntoNonEmptyListAtIndex1) {
    datatypes::list l;
    l.append(100);
    l.append(200);
    l.insert_at(10, 1);
    ASSERT_EQ(l.count(), 3);
    ASSERT_EQ(l.value_at(1), 10);
    ASSERT_EQ(l.value_at(2), 200);
    ASSERT_EQ(l.value_at(0), 100);
}

TEST(LinkedListTest, CannotInsertIndexOutOfRange) {
    datatypes::list l;
    ASSERT_EXIT((l.insert_at(10, 100), exit(0)), ::testing::KilledBySignal(SIGSEGV), ".*");
}

TEST(LinkedListTest, InsertAtNextInRange) {
    datatypes::list l;
    l.append(999);
    l.insert_at(1000, 1);
    ASSERT_EQ(l.count(), 2);
    ASSERT_EQ(l.value_at(1), 1000);
}

/*
 * Deletions
 */
TEST(LinkedListTest, CannotDeleteHeadOfEmptyList) {
    datatypes::list l;
    ASSERT_EXIT((l.delete_head(), exit(0)), ::testing::KilledBySignal(SIGSEGV), ".*");
}

TEST(LinkedListTest, CanDeleteHeadOfPopulatedList) {
    datatypes::list l;
    l.append(10);
    l.append(20);
    l.delete_head();
    ASSERT_EQ(l.count(), 1);
    ASSERT_EQ(l.value_at(0), 20);
}

TEST(LinkedListTest, CannotDeleteTailOfEmptyList) {
    datatypes::list l;
    ASSERT_EXIT((l.delete_tail(), exit(0)), ::testing::KilledBySignal(SIGSEGV), ".*");
}

TEST(LinkedListTest, CanDeleteTailOfPopulatedList) {
    datatypes::list l;
    l.append(10);
    l.append(20);
    l.delete_tail();
    ASSERT_EQ(l.count(), 1);
    ASSERT_EQ(l.value_at(0), 10);
}

TEST(LinkedListTest, CannotDeleteAtIndexOfEmptyList) {
    datatypes::list l;
    ASSERT_EXIT((l.delete_at(4), exit(0)), ::testing::KilledBySignal(SIGSEGV), ".*");
}

TEST(LinkedListTest, CanDeleteIndexZeroOfPopulatedList) {
    datatypes::list l;
    l.append(10);
    l.append(20);
    l.append(30);
    l.delete_at(1);
    ASSERT_EQ(l.count(), 2);
    ASSERT_EQ(l.value_at(0), 10);
    ASSERT_EQ(l.value_at(1), 30);
}

}

And the output:

Running main() from gtest_main.cpp
[==========] Running 20 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 20 tests from LinkedListTest
[ RUN      ] LinkedListTest.AppendToEmptyList
[       OK ] LinkedListTest.AppendToEmptyList (0 ms)
[ RUN      ] LinkedListTest.ValueAtIndex0InNonEmptyList
[       OK ] LinkedListTest.ValueAtIndex0InNonEmptyList (0 ms)
[ RUN      ] LinkedListTest.CannotGetValueAtNegativeIndex
[       OK ] LinkedListTest.CannotGetValueAtNegativeIndex (1 ms)
[ RUN      ] LinkedListTest.CannotGetValueAtIndexOutOfRange
[       OK ] LinkedListTest.CannotGetValueAtIndexOutOfRange (0 ms)
[ RUN      ] LinkedListTest.CountWorks
[       OK ] LinkedListTest.CountWorks (0 ms)
[ RUN      ] LinkedListTest.AppendToNonEmptyList
[       OK ] LinkedListTest.AppendToNonEmptyList (0 ms)
[ RUN      ] LinkedListTest.PrependToEmptyList
[       OK ] LinkedListTest.PrependToEmptyList (0 ms)
[ RUN      ] LinkedListTest.InsertIntoEmptyListAtIndex0
[       OK ] LinkedListTest.InsertIntoEmptyListAtIndex0 (0 ms)
[ RUN      ] LinkedListTest.CannotInsertIntoEmptyListAtIndex1
[       OK ] LinkedListTest.CannotInsertIntoEmptyListAtIndex1 (1 ms)
[ RUN      ] LinkedListTest.CannotInsertAtNegativeIndex
[       OK ] LinkedListTest.CannotInsertAtNegativeIndex (1 ms)
[ RUN      ] LinkedListTest.InsertIntoNonEmptyListAtIndex0
[       OK ] LinkedListTest.InsertIntoNonEmptyListAtIndex0 (0 ms)
[ RUN      ] LinkedListTest.InsertIntoNonEmptyListAtIndex1
[       OK ] LinkedListTest.InsertIntoNonEmptyListAtIndex1 (0 ms)
[ RUN      ] LinkedListTest.CannotInsertIndexOutOfRange
[       OK ] LinkedListTest.CannotInsertIndexOutOfRange (1 ms)
[ RUN      ] LinkedListTest.InsertAtNextInRange
[       OK ] LinkedListTest.InsertAtNextInRange (0 ms)
[ RUN      ] LinkedListTest.CannotDeleteHeadOfEmptyList
[       OK ] LinkedListTest.CannotDeleteHeadOfEmptyList (0 ms)
[ RUN      ] LinkedListTest.CanDeleteHeadOfPopulatedList
[       OK ] LinkedListTest.CanDeleteHeadOfPopulatedList (0 ms)
[ RUN      ] LinkedListTest.CannotDeleteTailOfEmptyList
[       OK ] LinkedListTest.CannotDeleteTailOfEmptyList (1 ms)
[ RUN      ] LinkedListTest.CanDeleteTailOfPopulatedList
[       OK ] LinkedListTest.CanDeleteTailOfPopulatedList (0 ms)
[ RUN      ] LinkedListTest.CannotDeleteAtIndexOfEmptyList
[       OK ] LinkedListTest.CannotDeleteAtIndexOfEmptyList (1 ms)
[ RUN      ] LinkedListTest.CanDeleteIndexZeroOfPopulatedList
[       OK ] LinkedListTest.CanDeleteIndexZeroOfPopulatedList (0 ms)
[----------] 20 tests from LinkedListTest (6 ms total)

[----------] Global test environment tear-down
[==========] 20 tests from 1 test suite ran. (6 ms total)
[  PASSED  ] 20 tests.

Obviously if you try to do something out of bounds, say insert at index 100 when on a list with only 5 items, or delete the head on an empty list, it's going to throw a segfault. I've accounted for that in my unit tests, and I don't think it's necessary to prevent a theoretical user (please don't ever use this code for anything real) from doing something stupid. There's no reason that I can think of that foundational data structure code should gracefully handle BS programmer input. That doesn't help anyone. But whenever someone posts tutorials about data structures, jackasses must pop out of the woodwork saying things like "This code doesn't even do error checking!" Thankfully nobody reads any of this in the first place and I also don't have a commentary system. :)