0x0026 - C Parse CSV File Example
C Parse CSV File Example
We are working through a stackoverflow example from here:
One can use the following code block to:
- Test for the existence of a file (exists)
- Then if it does open it and parse.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
bool exists(const char *fname)
{
FILE *file;
if ((file = fopen(fname, "r")))
{
fclose(file);
return true;
}
return false;
}
const char* getfield(char* line, int num)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--num)
return tok;
}
return NULL;
}
int main()
{
char tfile[] = "/csv/test_1min.AAL.csv";
if (exists(tfile))
{
FILE* stream = fopen(tfile, "r");
char line[1024];
while (fgets(line, 1024, stream))
{
char* tmp = strdup(line);
printf("Field 3 would be %s\n", getfield(tmp, 3));
// NOTE strtok clobbers tmp
free(tmp);
}
}
In our case we are dealing with CSV quote data, and sometimes the fields contain 'None' for the information set - and also we want to filter the header by the keyword 'open' which is one of the fields in the quote header. We are going to code our own filters from scratch.
The following developed strcmp worked well - scratch built:
bool str_in_str(char* a, const char* b, size_t alen, size_t blen)
{
// lazy checking would allow 'match' to be found inside 'm a t c h'
// strict checking would need a starting trip and shut off on first non-match
int mcount = 0;
bool checking = false;
for (int t = 0; t < alen; t++)
{
if (a[t] == b[0]) // single match trip start checking
{
if (t + blen > alen) // It cannot match as the blen extends past the end of alen so return falce
return false;
// first off if we are so f
for (int u = t; u < t + blen; u++)
if (a[u] == b[mcount])
{
mcount++;
if (mcount == blen)
return true;
}
}
}
return false;
}
Next we have a files_exists function as in:
bool file_exists(const char *fname)
{
FILE *file;
if ((file = fopen(fname, "r")))
{
fclose(file);
return true;
}
return false;
}
A [safe] string to float function based on the out-dated atof command from <stdio.h>. Yes there is strtol - but it requires you to define an ending pointer - ours does not!
float char_to_float(char* inchar)
{
int char_len = strlen(inchar);
bool can_process = true;
for (int x = 0; x < char_len; x++)
{
bool lchar = false;
char t = inchar[x];
if (t == '0')
lchar = true;
if (t == '1')
lchar = true;
if (t == '2')
lchar = true;
if (t == '3')
lchar = true;
if (t == '4')
lchar = true;
if (t == '5')
lchar = true;
if (t == '6')
lchar = true;
if (t == '7')
lchar = true;
if (t == '8')
lchar = true;
if (t == '9')
lchar = true;
if (t == '0')
lchar = true;
if (t == '.')
lchar = true;
if (not lchar) // non-passing character.
can_process = false;
}
if (can_process)
{
return atof(inchar);
}
return -99999999.0;
}
When you are coding in C you need to think a little differently about how data is handled when dealing with pointers. If you code a lot in Python (or any object language really) - you easily pass your pointer reference back in a fresh object and even the type handling is handled for you - so you pay little attention to these issues. In C because of function scope trimming by the compiler you pass your pointer to your function work on the data at the character level - but you simply return nothing because the pointer is still visible to the main function. Let's look at an example - we have a CSV line that we want to trim off the tailing '\n' consider:
void safe_copy(char* inchar)
{
int b = 0;
while (inchar[b] != 0) {
if (inchar[b] == '\n') {
inchar[b] = 0; // stamp it 0 truncating off the '\n'
return;
}
b++;
}
}
In the above code block we pass the char* to the function. Because character sets typically end in '0' we simply replace the '\n' when encountered with a 0 - but pass nothing back as it is already held by the passing function. Our character array is really only declared once - as a line-buffer back inside the main function.
In this next code block we have a separator counter. It is a basic type for our purposes (non quote "" handling) - again we simply accept the char pointer scan the data and return the count of separators informing us that the data is not mangled.
int separator_count(char* inchar, char sep)
{
// This is not quote handling.
int b = 0;
int mcount = 0;
while (inchar[b] != 0) {
if (inchar[b] == sep) mcount++;
b++;
}
return mcount;
}