C Programming | Working with files I

FreeBSD gearing-up

Index


At some point when developing software no matter how big o small the program is going to be, we need to store some data in the computer, and read from other sources too. Let's take a look at how to work with external files in C.

Files in C programming don't have a predefined structure. They are meant to be a container for some sequence of bytes. That way the internal structure of a file is something that the program itself has to deal with.

As long as we know how a file structure is made, we can open, work and write with any file.

Opening files

Opening files in C can be achieved in two ways; using the stdio function fopen(3) or using the lower level one open(2).

The main difference between them is that open(2) is a system call while fopen(3) is a library call.

fopen(3) calls open(2) under the hood and uses buffering to improve execution timing. When timing is critical(eg. embedded systems), is better to use open(2) and take full control on when we want the data to be processed.

The fopen(3) way

The fopen(3) function associates a file with a stream and initializes an object of the type FILE, which contains a structure with information to control the stream.

We can specify how we want to operate with the data by passing different modes into the mode parameter.

Possible modes are:

Adding a + sign after any of the letters make the file to work in update mode. That is, the mode allows both reading and writing.

FILE *file = fopen("path/to/file.type", "mode");

The open(2) way

The open(2) function returns an int object called file descriptor. Every open file has a file descriptor number, which is used by the operating system to keep track of them.

Similar to fopen(3), we can specify how we want to work with the opened file passing specific flags into the flags parameter.

Valid mandatory flags are:

Additional flags can be added in order to perform other operations such as O_APPEND to open a file in append mode, or O_ASYNC to use a pipe of a FIFO.

We can add a third optional parameter to specify permissions of the file, like:

int file_data = open("path/to/file.type", flags, mode);

Writing files

We can run a program that takes arguments from the user via the terminal emulator, and perform operations based on those arguments, print them back to the terminal, and ask for more operations if needed, but each time we close the program, that data is gone.

We can write data in binary files and in text files.

The standard library has two useful functions to help us in the task of saving that data we ask for and process during the program execution, into a file. These functions are fwrite(3) and fprintf(3).

Using fwrite(3)

The function fwrite(3) writes a number of objects of a given size to a file. Is often used to write binary data.

The information we need to pass to fwrite(3) is the following:

fwrite(&data, sizeof(data_type), strlen(data), file);

This is going to return us a binary file. We can check its content using a tool like hexdump(1).

typedef struct {
    int power;     //kWcar_t
    int torque;    //NM
    int wheels;    //[4, 5]
    int seats;     //up to 7
    int doors;     //[3, 5]
} car_t;

car_t rally_car {
    .power = 235,
    .torque = 384,
    .wheels = 5,
    .seats = 2,
    .doors = 3
};

FILE *file = fopen("cars.bin", "w");

fwrite(&rally_car, sizeof(car_t), 1, file);

fclose(file);

— We can however, write text files using fwrite(3) by making use of the function sprintf(), which writes its output as a string in the buffer referenced.

char buffer[40];

sprintf(buffer, "The actual engine torque is %f.\n", engine.torque);
fwrite(buffer, sizeof(char), strlen(buffer), file);

fclose(file);

Using fprintf(3)

Similar to the printf() function, we have fprintf(3) in the standard library, with which we can write formatted outputs into a file, passing a character constant as a format parameter.

The information we need to pass to fprintf(3) is the following:

fprintf(file_pointer, format, content);

This way we store text data by default in a file.

FILE *file = fopen("temp.log", "a");

if (file != null)
    fprintf(file, "%s\n", "Appending data to temp file.");

fclose(file);

At the end of the note we'll use this function to serialize some JSON data.

Other operations with files

Apart from opening and writing files the header file `stdio.h`` has more functions required to work with I/O which we can use to rename, remove, and close files among other operations.

Let's look at some of them:

Close a file

Once we are done working with a file, we can close the stream and free up the memory using the function fclose(3). The function deletes any unwritten data for the stream and discards any unread buffered input, so be sure to write changes before.

fclose(file);

Rename a file

We can rename a file using the function rename() by passing the name of the old file and a string (const *char) to use as the new one.

rename("old_file_name", "new_file_name");

Remove a file

We can make a file unavailable using the remove() function, passing the file's filename. If the file has no other names linked, then the file is deleted. Depending on the mode used by the file, the function may or may not be able to perform the deletion.

remove("file_name");

Create a temporary file

Using tmpfile() we can create a temporary file with a unique name in wb+ mode which is automatically removed once we close it or the program terminates.

If the function is unable to open a temporary file, it returns a NULL pointer, otherwise it returns a pointer to the temp file.

FILE *file = tmpfile(); //file is pointing to a tempfile.

How to map files in memory

There is a way to work more efficiently with files, that is allocating them in virtual memory with mmap(2).

Virtual memory helps when the processes ask for more memory than the system has. At that point the operating system's memory management takes memory from the RAM and places it into the swap, bringing it back to the RAM when requested. Is basically moving data from the RAM to the hard drive back and forward.

We can use that way of work to read and write files too.

Let's use mmap(2) to request blocks of memory from a text file (it can be any other file too):

Open a file

int file_data = open("text_file.txt", O_RDONLY, S_IRUSR | S_IWUSR);

If we want to also write content into the file we have to open it in a read-write mode using different flags in the open(2) function:

int file_data = open("text_file.txt", O_RDWR, S_IRUSR | S_IWUSR);

We can do the same using fopen(3), but is a good thing not to mix high level I/O with low level operations. We would killing the performance.

If we use fopen(3) then we need to use the function fileno() to get the file descriptor from our opened file.

FILE *file_data = fopen("text_file.txt", "r");
int fileDescriptor = fileno(file_data);

Get the size of the file

We need to include <sys/stat.h> and <unistd.h> to help:

#include <sys/stat.h>
#include <unistd.h>
...
struct stat sb;
if(fstat(file, &sb) == -1)
    printf("couldn't get file size\n");

Allocate in memory using mmap(2)

We need to pass the following parameters to the function:

char *file_in_ram = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, file_data, 0);

Operate with the data in memory

Now that we have mapped our file we can start working freely with it.

for (int i = 0, i < sb.st_size; i++)
    printf("%c", file_in_ram[i];
printf("\n");

Unmap memory and close the file

Once we're done working with the file, just by closing the file descriptor we don't un-map the data. The function munmap() takes mapped file and deletes its mappings in the specified address range.

After that we can close the file descriptor to finish.

munmap(file_in_ram, sb.st_size);
close(file_data);

A complete view of the code should look like this:

int main() {
    int file_data = open("plain_text_file.txt");

    struct stat sb;

    char *file_in_ram = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, file_data, 0);

    for (int i = 0, i < filesize; i++)
        printf("%c", file_in_ram[i];

    munmap(file_in_ram, filesize);
    close(file_data);
}

Structuring data

We know that the C programming language doesn't care about the type of file we use. Some applications may be fulfilled by storing data in plain text files, but even by being text files, they may need to follow a structure so we can interoperate later with the data inside them.

To achieve this we need to convert the abstract in-memory data into a series of bytes that record the data structure into a recoverable format. This is called serialization.

Our data structure can be a simple list or array, a complex group of nested arrays and structs, or whatever required.

Writing structured data to a file

— As an example, let's take a look at a program where the user can store information about a vehicle's engine.

We should have a struct type that handles how an engine is defined.

/*simplified engine structure*/
typedef struct {
    char model[10];                  //engine model
    char manufacturer[10];           //engine manufacturer
    int power;                       //kW
    int torque;                      //NM
    int cylinders;                   //total cylinders in engine
    int structure;                   //block structure [1, 2, 3] rows
    char fuel_type[10];              //fuel type [gasoline, diesel]
} engine_t;

Once we are working in the program we can create an engine and assign values to it.

engine_t engine {
    .model = "RB26DETT",
    .manufacturer = "nismo",
    .power = 235,
    .torque = 384,
    .cylinders = 6,
    .structure = 1,
    .fuel_type = "gasoline"
};

Now it's time to define a constant to serialize the data into a file. Instead of reinventing the wheel, let's use an existing data-interchange format such as JSON (XML applies here too).

const char *ENGINE_EXPORT_FMT =
"{\n\t\"model\": \"%s\",\n\t\"manufacturer\": \"%s\",\n\t\"power\": %d,\n\t\"torque\": %d,\n\t\"cylinders\": %d,\n\t\"structure\": %d,\n\t\"fuel\": \"%s\"\n}\n";

Most of the “complexity” here is to correctly describe our object. As for this simple example, we can just go with this constant. For serious projects we would need to improve this in a header file and probably make some functions that warp the process.

Moving on, we have to open a file to write the data to, or create a new one.

FILE *file = fopen("engine_data.json", "w+");

Once we have our file opened, we need to print the content of our engine struct into it, using the function fprintf(3).

fprintf(file, ENGINE_EXPORT_FMT, engine.model, engine.manufacturer, engine.power, engine.torque, engine.cylinders, engine.structure, engine.fuel_type);

Note that we have named our example file as .json but we could actually add the name and extension we'd want, and the result would be the same.

A complete view of the code should look like this:

#include<stdio.h>
#include<stdlib.h>

/*engine struct format data*/
const char *ENGINE_EXPORT_FMT = "{\n\t\"model\": \"%s\",\n\t\"manufacturer\": \"%s\",\n\t\"power\": %d,\n\t\"torque\": %d,\n\t\"cylinders\": %d,\n\t\"structure\": %d,\n\t\"fuel\": \"%s\"\n}\n";

/*simplified engine structure*/
typedef struct {
    char model[10];                  //engine model
    char manufacturer[10];           //engine manufacturer
    int power;                       //kW
    int torque;                      //NM
    int cylinders;                   //total cylinders in engine
    int structure;                   //block structure [1, 2, 3] rows
    char fuel_type[10];              //fuel type [gasoline, diesel]
} engine_t;


int main() {
    engine_t engine {
        .model = "RB26DETT",
        .manufacturer = "nismo",
        .power = 235,
        .torque = 384,
        .cylinders = 6,
        .structure = 1,
        .fuel_type = "gasoline"
    };

    FILE *file = fopen("engine_data.json", "w+");

    fprintf(file, ENGINE_EXPORT_FMT, engine.model, engine.manufacturer, engine.power, engine.torque, engine.cylinders, engine.structure, engine.fuel_type);

    fclose(file);

    return 0;
}

We should have a new file named engine_data.json in our directory with the engine struct parsed into it.

Parsing structured data from a file

If we want the saved data to be used back in the program, we have to kinda reverse engineering our constant to parse our object.

Since we already know how our file content is stored, the process can be a bit more straight forward.

const char *ENGINE_IMPORT_FMT =
"{\n\t\"model\": \"%[^\"]\",\n\t\"manufacturer\": \"%[^\"]\",\n\t\"power\": %d,\n\t\"torque\": %d,\n\t\"cylinders\": %d,\n\t\"structure\": %d,\n\t\"fuel\": \"%[^\"]\"\n}";
fseek(file, 0, SEEK_SET);
engine_t i_engine;
fscanf(file, ENGINE_IMPORT_FMT, i_engine.model, i_engine.manufacturer, &i_engine.power, &i_engine.torque, &i_engine.cylinders, &i_engine.structure, i_engine.fuel_type);

A complete view of the code should look like this:

#include<stdio.h>
#include<stdlib.h>

/*engine struct format data*/
const char *ENGINE_IMPORT_FMT = "{\n\t\"model\": \"%[^\"]\",\n\t\"manufacturer\": \"%[^\"]\",\n\t\"power\": %d,\n\t\"torque\": %d,\n\t\"cylinders\": %d,\n\t\"structure\": %d,\n\t\"fuel\": \"%[^\"]\"\n}";

/*simplified engine structure*/
typedef struct {
    char model[10];                  //engine model
    char manufacturer[10];           //engine manufacturer
    int power;                       //kW
    int torque;                      //NM
    int cylinders;                   //total cylinders in engine
    int structure;                   //block structure [1, 2, 3] rows
    char fuel_type[10];             //fuel type [gasoline, diesel]
} engine_t;


int main() {
    engine_t engine;

    FILE *file = fopen("engine_data.json", "r");

    fseek(file, 0, SEEK_SET);

    fscanf(file, ENGINE_EXPORT_FMT, engine.model, engine.manufacturer, &engine.power, &engine.torque, &engine.cylinders, &engine.structure, engine.fuel_type);

    fclose(file);

    return 0;
}

Summing up

Files play a really important role in software programs. We've seen how to work with operations that read, write and format text both from and into files, but the same can be achieved for binary files such as images or audio.

In addition to that, we can also implement ways to obfuscate how our program writes the data so not everyone can open our format back. This is kind of an unfriendly way to do the things, but corporate often makes this so the competition cannot just sneak into a company's new software and steal how they engineer things. But hey, we have reverse engineers to do so (:

A further discussion in this field will be present in a future note.