Tuesday, June 10, 2008

Write your own tail -n implementation

This is one of the question asked in Adobe to my friend. You need to implement your own tail function. read the man page for more information about tail command.

I could able to think of two ways to solve the problem. [tail -n 10 filename ]

Solution 1:

Keep 2 pointers, 1st points to the start of the file, the 2nd points to the n+1 th line :

if the file is not big enough [ less than n ] then dump the content of the file from the beginning to the end of the file and return
else move the second pointer by one line and move the 1st pointer by one until the second reaches EOF. One the 2nd pointer reaches the end of the file, dump the content from where the 1st pointer currently pointing till the end of the file.

This is one of the solution me and Arun discussed when I was there in Hyderabad. But there are lot of improvements that we can do in this and the companies expect such kind of answers (just for the interview sake)

Problem: One of the biggest problem with the above solution is the performance. Both the pointers are reading the same content (2 times reading the same content). Since disk I/O is costly this is a real performance issue.


Solution 2 :

The next solution is instead of reading from the file twice, create a buffer to hold the content, read the file from the end of the file and dump it into the buffer, also keep track of the no of lines read so far from the end. Once the required no of lines read, print the content of the buffer from the end.


My Implementation :


#include <stdio.h&gt;
#include <stdlib.h&gt;
#include <string.h&gt;
#include <sys types.h>
#include <sys stat.h>
#include <fcntl.h&gt;

void tail(int lines, char *file_name)
{
struct stat file;
int size = 0;
char *ch = NULL;
char *buf = NULL;

FILE *fd = fopen(file_name, "r");

if(fd == NULL)
{
printf("Not able to open the file : %s\n", file_name);
return;
}

// Set the pointer to the end
fseek(fd, 0, SEEK_END);

// Find the size of the file
size = ftell(fd);
--size;

// Allocate the buffer to hold the content
ch = (char *)malloc(2);
buf = (char *)malloc(80 * lines);
memset(buf, 0, 80 * lines);

char * start = buf;
*buf++ = '\0';

// Copy the file content from the end of the file in the reverse order into the buffer
// Use fseek --> SEEK_SET to navigate the file from the back

while(size && lines)
{
--size;
fseek(fd, size , SEEK_SET);
fread(ch, 1, 1, fd);
*buf++ = *ch;
if(*ch == '\n')
lines--;


}
close(fd);

// Print the content of the buffer
while( start != buf)
printf("%c", *buf--);

printf("\n");
return;
}
int main(int argc, char **argv)
{
if(argc != 3)
{
printf("Run exe file_name no_of_lines\n");
return (-1);
}

char *file = argv[1];
tail(atoi(argv[2]), file);

return 0;
}


input file :cat ganesh
1 Ganesh
2 Vishnu
3 Ramak
4 Arun
5 Jerome

Output:

# ./a ganesh 2

4 Arun
5 Jerome

# ./a ganesh 4

2 Vishnu
3 Ramak
4 Arun
5 Jerome

./a ganesh 10
1 Ganesh
2 Vishnu
3 Ramak
4 Arun
5 Jerome

PS : Guys don't get envy that I wrote a big program for myself. I struggled a lot to make this program to work :-)

3 comments:

  1. tail should dynamically refresh ?

    ReplyDelete
  2. thats correct ramak .. tail -f [filename] refresh dynamically .. but here the problem is to implement tail -n 10 [filename]

    ReplyDelete
  3. /`*This example implements the option n of tail command.*/`

    #define _FILE_OFFSET_BITS 64
    #include
    #include
    #include
    #include
    #include
    #include

    #define BUFF_SIZE 4096

    FILE *openFile(const char *filePath)
    {
    FILE *file;
    file= fopen(filePath, "r");
    if(file == NULL)
    {
    fprintf(stderr,"Error opening file: %s\n",filePath);
    exit(errno);
    }
    return(file);
    }

    void printLine(FILE *file, off_t startline)
    {
    int fd;
    fd= fileno(file);
    int nread;
    char buffer[BUFF_SIZE];
    lseek(fd,(startline + 1),SEEK_SET);
    while((nread= read(fd,buffer,BUFF_SIZE)) > 0)
    {
    write(STDOUT_FILENO, buffer, nread);
    }
    }

    void walkFile(FILE *file, long nlines)
    {
    off_t fposition;
    fseek(file,0,SEEK_END);
    fposition= ftell(file);
    off_t index= fposition;
    off_t end= fposition;
    long countlines= 0;
    char cbyte;

    for(index; index >= 0; index --)
    {
    cbyte= fgetc(file);
    if (cbyte == '\n' && (end - index) > 1)
    {
    countlines ++;
    if(countlines == nlines)
    {
    break;
    }
    }
    fposition--;
    fseek(file,fposition,SEEK_SET);
    }
    printLine(file, fposition);
    fclose(file);
    }

    int main(int argc, char *argv[])
    {
    FILE *file;
    file= openFile(argv[2]);
    walkFile(file, atol(argv[1]));
    return 0;
    }

    /*Note: take in mind that i not wrote code to parse input options and arguments, neither code to check if the lines number argument is really a number.*/

    ReplyDelete