Mar 11 2008 Programming

Here is my frontend browser interface and backend for access logs.
Didn't see any like this around so I coded my own.  

Web statistics aren't very interesting. The most interesting thing about them is to see what people search for. On the other hand they are useful to find spammers, see who's just read through nmap examples from my netblock, log people sending IIS exploits, etc.
Manually viewing the access log file is just wrong. Trying to parse it with regexes, I kept running into corner cases that failed. So, I set out to make a web interface where just a few clicks would bring me all the information I wanted.

Here's the database layout. One table with a field for each of the possible available information.

drop database if exists apache_log;
create database apache_log;

use apache_log;

drop table if exists `access_log`;

create table `access_log` (
    `remote_ip` char(15) not null,
    `remote_host` varchar(50) not null,
    `request_method` char(8) not null,
    `server_name` varchar(25) not null,
    `url_path_requested` varchar(75) not null,
    `query_string` varchar(75),
    `protocol_used` char(8) not null,
    `header_accept` varchar(300),
    `header_accept_encoding` varchar(150),
    `header_accept_charset` varchar(50),
    `header_accept_language` varchar(20),
    `header_date` varchar(15),
    `header_referer` varchar(250),
    `header_user_agent` varchar(250),
    `final_result_code` char(3) not null,
    `served_in` smallint(6) unsigned not null,
    `served_filename` varchar(300) not null,
    `in_bytes` smallint(4) unsigned not null,
    `out_bytes` int(15) unsigned not null,
    `date_invoked` char(18) not null,
    `id` mediumint(9) not null auto_increment,
    primary key(`id`)
);


Download

Referrer should have a bigger field I keep meaning to modify that.
Indexes are needed when the dataset grows large, I am such a procrastinator.

Here is the apache.conf custom logging entry. uniquestring is a placeholder for an actual unique string which is used later on in a regex.

LogFormat "UNIQUESTRING1%aUNIQUESTRING2%hUNIQUESTRING3%mUNIQUESTRING4%VUNIQUESTRING5%U
UNIQUESTRING6%qUNIQUESTRING7%UNIQUESTRING8%{Accept}iUNIQUESTRING9%{Accept-Encoding}i
UNIQUESTRING10%{Accept-Charset}iUNIQUESTRING11%{Accept-Language}iUNIQUESTRING12%{Date}i
UNIQUESTRING13%{Referer}iUNIQUESTRING14%{User-Agent}iUNIQUESTRING15%>sUNIQUESTRING16%D
UNIQUESTRING17%fUNIQUESTRING18%IUNIQUESTRING19%OUNIQUESTRING20%{%Y-%m-%d %T}t" unique_all
CustomLog "|/absolute/path/to/webserver_logger.pl" unique_all


Download

Line breaks have been added for display. The actual unique string is one long line.

Here's the program that does the actual logging. Something is nagging me that I should have it fork().

#!/usr/bin/perl

# E Dziewa jan 2008

use warnings;
use strict;
use DBI;

my $database = "apache_log";
my $host = "localhost";
my $username = "PUT A USERNAME HERE";
my $password = "PUT A PASSWORD HERE";

my $dbh = DBI->connect("DBI:mysql:database=$database;host=$host;", "$username", "$password", {'RaiseError' => 1});

while ( $_ = <> ) {
chomp;

my ( $ip, $host, $method, $servername, $url, $query, $protocol, $h_accept, $h_encoding, $h_charset, $h_language, $h_date, $h_referer, $h_useragent, $result, $servedin, $filename, $inbytes, $outbytes, $date ) =
    /UNIQUESTRING1(.*)UNIQUESTRING2(.*)UNIQUESTRING3(.*)UNIQUESTRING4(.*)UNIQUESTRING5(.*)
    UNIQUESTRING6(.*)UNIQUESTRING7(.*)UNIQUESTRING8(.*)UNIQUESTRING9(.*)
    UNIQUESTRING10(.*)UNIQUESTRING11(.*)UNIQUESTRING12(.*)
    UNIQUESTRING13(.*)UNIQUESTRING14(.*)UNIQUESTRING15(.*)UNIQUESTRING16(.*)
    UNIQUESTRING17(.*)UNIQUESTRING18(.*)UNIQUESTRING19(.*)UNIQUESTRING20(.*)/;

if ( $h_referer =~ /, / ) { ### there's no spaces in referrers, must be spam
    my $file = '/absolute/path/to/SPAM.log';
    open( FH, ">>", $file );
    print FH "$ip $date $h_referer\n";
    close FH;
    $h_referer = 'SPAM';
}

if ( $url =~ /\\x+|^-$/ ) { ### allow for bad guys trying to crash the server
    my $file2 = '/absolute/path/to/CRASHER.log';
    open( CFH, ">>", $file2 );
    print CFH "$ip $date length: ", length($url), " bytes\n\t$url\n";
    close CFH;
    $url = "Attempted Crash String";
}

    my $statement = $dbh->prepare("insert into access_log (
        remote_ip,
        remote_host,
        request_method,
        server_name,
        url_path_requested,
        query_string,
        protocol_used,
        header_accept,
        header_accept_encoding,
        header_accept_charset,
        header_accept_language,
        header_date,
        header_referer,
        header_user_agent,
        final_result_code,
        served_in,
        served_filename,
        in_bytes,
        out_bytes,
        date_invoked
        )
        values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);
        ");
        $statement->execute($ip, $host, $method, $servername, $url, $query, $protocol, $h_accept, $h_encoding,
            $h_charset, $h_language, $h_date, $h_referer, $h_useragent, $result, $servedin, $filename, $inbytes,
            $outbytes, $date);
        $statement->finish();
}
$dbh->disconnect();


Download

And the interface itself.

#!/usr/bin/perl

# eric dziewa february 2008

use warnings;
use strict;

use CGI qw(:standard -no_xhtml);

my $doc = new CGI;

my @checkboxes = (
'ip', 'accept', 'host', 'encoding', 'method', 'charset', 'server', 'language', 'request', 'date', 'query',
 'referrer', 'protocol', 'user agent', 'result', 'time taken', 'filename', 'bytes recieved', 'bytes out', 'server date');

my %columns = (
'ip'                            =>        'remote_ip',
'host'                          =>        'remote_host',
'method'                        =>        'request_method',
'server'                        =>        'server_name',
'request'                       =>        'url_path_requested',
'query'                         =>        'query_string',
'protocol'                      =>        'protocol_used',
'accept'                        =>        'header_accept',
'encoding'                      =>        'header_accept_encoding',
'charset'                       =>        'header_accept_charset',
'language'                      =>        'header_accept_language',
'date'                          =>        'header_date',
'referrer'                      =>        'header_referer',
'user agent'                    =>        'header_user_agent',
'result'                        =>        'final_result_code',
'time taken'                    =>        'served_in',
'filename'                      =>        'served_filename',
'bytes recieved'                =>        'in_bytes',
'bytes out'                     =>        'out_bytes',
'server date'                   =>        'date_invoked',
);

my $dtime = `date '+%Y-%m-%d'`;
chomp $dtime;

print $doc->header,
      $doc->start_html( -title=>'webstats',
                        -meta=>{'author'=>'eric dziewa','keywords'=>'some \'KEY\' words','description'=>'descriptive text'},
                        -style=>{'src'=>'validcss.css'}
        ),

'<table>',
'<tr><td colspan="3">',
'<FIELDSET> <LEGEND> What would you like to see? </LEGEND>',
start_form,
    checkbox_group(-name=>'groupname',
                                -values=> \@checkboxes ,
                                -rows=>2,-columns=>10
    ),
'</fieldset>',

'<tr><td colspan="2">',

    popup_menu( -name => 'searchfield',
                -values => \@checkboxes,
                -default => 'server date'

    ),

    textfield('searchstring',"$dtime",50,80),
'<td align="left">',
    checkbox_group( -name => 'not',
                    -values => [ 'negate' ],
    ),
    checkbox_group( -name => 'unique',
                    -values => [ 'unique' ],
    ),
    checkbox_group( -name => 'count',
                    -values => ['count' ],
    ),

### row two ###
'<tr><td colspan="2">',

    popup_menu( -name => 'searchfield2',
                -values => \@checkboxes,
    ),

    textfield('searchstring2','',50,80),
'<td align="left">',
    checkbox_group( -name => 'not2',
                    -values => [ 'negate' ],
                    -rows => 1, columns => 1
    ),
### row three ###
'<tr><td colspan="2">',

    popup_menu( -name => 'searchfield3',
                -values => \@checkboxes,
    ),

    textfield('searchstring3','',50,80),
'<td align="left">',
    checkbox_group( -name => 'not3',
                    -values => [ 'negate' ],
                    -rows => 1, columns => 1
    ),

'<tr><td colspan="3">',
    '<br><br>',
    submit('get_it','Submit'),
    defaults('Reset'),
    hidden('hidden_name','submitted');
end_form;
print '</table>';

    use DBI;
    my $database = "apache_log";
    my $host = "localhost";
    my $user = "PUT USERNAME HERE";
    my $pass = "PUT PASSWORD HERE";
    my $dbh = DBI->connect("DBI:mysql:database=$database;host=$host;", "$user", "$pass", {'RaiseError' => 1});

    my ( $searchfield, $searchstring, $neg, $hidden, $searchfield2, $searchstring2, $neg2, $searchfield3, $searchstring3, $neg3, $unique, $count, $both );
    my ( @checkouts, @new_improved_array );


if ( param() ) {
    @checkouts = param('groupname');
    $searchfield = param('searchfield');
    $searchstring = param('searchstring');
    ( $neg, $unique, $count ) = '';
    $hidden = param('hidden_name');

    ( defined(param('not')) ) ? $neg = "not" : '';
    ( defined(param('unique')) ) ? $unique = "distinct($columns{$checkouts[0]})" : '';
    ( defined(param('count')) ) ? $count = "count($columns{$checkouts[0]})": '';
    ( ($count) && ($unique) ) ?    $both = "count(distinct($columns{$checkouts[0]}))" : '';
}

if (defined(param('searchstring2')) ) {
    $searchfield2 = param('searchfield2');
    $searchstring2 = param('searchstring2');
        defined(param('not2')) ? $neg2 = "not" : '';

        if (defined(param('searchstring3')) ) {
            $searchfield3 = param('searchfield3');
            $searchstring3 = param('searchstring3');
            defined(param('not3')) ? $neg3 = "not" : '';
        }
}

if ( scalar(@checkouts) >= 1 ) {
    @new_improved_array = '';
    @new_improved_array = map { $columns{$_} . ',' } @checkouts;
    chop $new_improved_array[$#new_improved_array];
    if ( param('searchstring2') !~ /\w|\d/ )  { one_row( $searchfield, $neg, $searchstring, @new_improved_array) }
    elsif ( param('searchstring3') !~ /\w|\d/ ) { two_row( $searchfield, $searchfield2, $neg, $neg2, $searchstring, $searchstring2, @new_improved_array ) }
    else { three_row( $searchfield, $searchfield2, $searchfield3, $neg, $neg2, $neg3, $searchstring, $searchstring2, $searchstring3, @new_improved_array ) }
}

else { nothing_selected() }

print '</body></html>';

sub three_row {
    my ( $searchfield, $searchfield2, $searchfield3, $neg, $neg2, $neg3, $searchstring, $searchstring2, $searchstring3, @array) = @_;
    my $shade;
    my $i = 1;
    my $statement = $dbh->prepare("select @array from access_log where $columns{$searchfield} $neg regexp '$searchstring' and $columns{$searchfield2} $neg2 regexp '$searchstring2' and $columns{$searchfield3} $neg3 regexp '$searchstring3' and remote_ip != '^192.168' ;");

    $statement->execute();

    print "<table>";
    foreach ( @checkouts ) {
        print "<td align=\"center\"class=\"gray\">$_</td>";
    }
    while ( my @row = $statement->fetchrow_array ) {
        print "<tr>";
        foreach ( @row ) {
            $shade = row_color($i);
            print "<td class=$shade>$_";
        }
        print "</tr>";
        if ( $i % 30 == 0) { foreach ( @checkouts ) { print "<td align=\"center\"class=\"gray\">$_</td>"; } }
        $i++;
    }
    print "</table>";
    $statement->finish();
    $dbh->disconnect();
}

sub two_row {
    my ( $searchfield, $searchfield2, $neg, $neg2, $searchstring, $searchstring2, @array) = @_;
    my $shade;
    my $i = 1;
    my $statement = $dbh->prepare("select @array from access_log where $columns{$searchfield} $neg regexp '$searchstring' and $columns{$searchfield2} $neg2 regexp '$searchstring2' ;");
    $statement->execute();

    print "<table>";
    foreach ( @checkouts ) {
        print "<td align=\"center\"class=\"gray\">$_</td>";
    }
    while ( my @row = $statement->fetchrow_array ) {
        print "<tr>";
        foreach ( @row ) {
            $shade = row_color($i);
            print "<td class=$shade>$_";
        }
        print "</tr>";
        if ( $i % 30 == 0) { foreach ( @checkouts ) { print "<td align=\"center\"class=\"gray\">$_</td>"; } }
        $i++;
    }
    print "</table>";
    $statement->finish();
    $dbh->disconnect();
}

sub one_row {
    my ($searchfield, $neg, $searchstring, @array) = @_;
    my $statement;
        if ( ($count) && ($unique) ) {
            $statement = $dbh->prepare("select $both from access_log where $columns{$searchfield} $neg regexp '$searchstring';");
        }
        elsif ( defined( $count ) ) {
            $statement = $dbh->prepare("select $count from access_log where $columns{$searchfield} $neg regexp '$searchstring';")
        }
        elsif ( defined( $unique ) ) {
            $statement = $dbh->prepare("select $unique @array from access_log where $columns{$searchfield} $neg regexp '$searchstring';")
        }
        else {
            $statement = $dbh->prepare("select @array from access_log where $columns{$searchfield} $neg regexp '$searchstring';")
        }
    my $shade;
    my $i = 1;

    $statement->execute();

    print "<table>";
    foreach ( @checkouts ) {
        print "<td align=\"center\"class=\"gray\">$_</td>";
    }
    while ( my @row = $statement->fetchrow_array ) {
        print "<tr>";
        foreach ( @row ) {
            $shade = row_color($i);
            print "<td class=$shade>$_";
        }
        print "</tr>";
        if ( $i % 30 == 0) { foreach ( @checkouts ) { print "<td align=\"center\"class=\"gray\">$_</td>"; } }
        $i++;
    }
    print "</table>";
    $statement->finish();
    $dbh->disconnect();
}

sub row_color {
    my $i = shift;
    (($i % 3 == 0) && (! $i % 2 == 0)) ? ( return '"blue"' ):
    ($i % 2 == 0) ? ( return '"lblue"' ):
    ( return '"white"');
}


sub nothing_selected {
    my $statement;

    $statement = $dbh->prepare("select sum(out_bytes/1073741824) from access_log;");
    $statement->execute();
     my $num = $statement->fetchrow_array;
    $statement->finish();

    $statement = $dbh->prepare("select sum(out_bytes/1048576) from access_log where date_invoked regexp '$dtime';");
    $statement->execute();
     my $todaysnum = $statement->fetchrow_array;
    $statement->finish();

    $statement = $dbh->prepare("select count(*) from access_log;");
    $statement->execute();
     my $count = $statement->fetchrow_array;
    $statement->finish();

    $statement = $dbh->prepare("select count(*) from access_log where date_invoked regexp '$dtime';");
    $statement->execute();
     my $todayscount = $statement->fetchrow_array;
    $statement->finish();

    $statement = $dbh->prepare("select count(distinct(remote_ip)) from access_log;");
    $statement->execute();
     my $distinct = $statement->fetchrow_array;
    $statement->finish();

    $statement = $dbh->prepare("select count(distinct(remote_ip)) from access_log where date_invoked regexp '$dtime';");
    $statement->execute();
     my $todaysdistinct = $statement->fetchrow_array;
    $statement->finish();

    $dbh->disconnect();

    printf("<br><br>Served %.3f GB, and $count requests, to $distinct distinct visitors since log beginning", $num);
    printf("<br><br>Served %.2f MB, and $todayscount requests, to $todaysdistinct distinct visitors today", $todaysnum);  

}

### notes ###

# 1 Byte = 8 Bit
# 1 Kilobyte = 1024 Bytes
# 1 Megabyte = 1048576 Bytes
# 1 Gigabyte = 1073741824 Bytes

Download

Some screenshots:
Initial State
Random Sample
Negated User Agent
Referrer Spam
Unique Languages
Unique Languages Count
German Speakers

   
Comments
No comments.
Comments for this entry available via RSS.
Comment Area
Your Name
Your Email (will not be published)
Your Website
Your Comment
Profanity is Prohibited
eric.dziewa.com is running WordPress.
WhiteSpace theme designed by E. Dziewa.
All content © E. Dziewa.
Thanks for stopping by.