OPS445 Assignment 1 Part 2

From Littlesvr Wiki
Jump to navigation Jump to search

Assignment 1 - Parsing a log file

Part 2 - Parser

Weight: 5% of the overall grade.

Due Date: Ask your professor for exact date.

Late penalty: 10% per day, and note the assignment must be completed satisfactorily in order to pass the course no matter what grade you get.

Overview

In part 1 of this assignment you created a program with command-line menu-based navigation that didn't do anything.

In this part 2 you will add functionality to all your menu items.

Requirements

  • You may use any coding style you choose - but make sure it is consistent throughout the program. Coding style is important - not least because I have to read it. "Style" includes adding a comment at the top of every function explaining what it does. You won't lose marks for too many comments, you will lose marks for too few.
  • If your assignemnt doesn't work or has major problems - you will be asked to resubmit it, which will likely cost you more than a late penalty.

Header

Your program must be a single source file named check_apache_log_yourmysenecaid.py, and at the top of that file it will contain the following comment:

# OPS435 Assignment 1
# check_apache_log_yourmysenecaid.py
# Author: Your Name

Loading the logs

When your program starts: don't just print "Loading filename..." but actually read those files, and store every line from all those files in a global list all_log_lines.

Look into how to access a global variable from inside a function, you will need it for every function you implement for this part of the assignment.

You can use these logs for testing (the examples are based on these log files).

Regular expression

All of the functions for this part of the assignment will be easier to complete if you use a regular expression. You can use this code on each line you're evaluating:

match = re.match('([(\\d\\.)]+) - - \\[(.*?)\\] "(.*?)" (\\d+) ((\\d+)|-).*', line)
if match != None:
    ip = match.group(1)
    code = match.group(4)
    request = match.group(3)

How many total requests (Code 200)

This function will:

  • Iterate through all the log lines, and using the regular expression I gave you:
    • Check whether the code is 200.
    • Count how many lines with code 200 you found.
  • At the end: print the total count.

How many requests from Seneca (IPs starting with 142.204)

This function will:

  • Iterate through all the log lines, and using the regular expression I gave you:
    • Check whether the ip address begins with 142.204. You can use the python startswith() method for this.
    • Count how many such lines you found.
  • At the end: print the total count.

How many requests for OPS435_Lab

This function will:

  • Iterate through all the log lines, and using the regular expression I gave you:
    • Check whether the request string contains "OPS435_Lab" (not including the quotes). You will need a second (much simpler) regular expression for this check.
    • Count how many such lines you found.
  • At the end: print the total count.

How many total "Not Found" requests (Code 404)

This function is the same as the "How many total requests (Code 200)" function, but looks for code 404 instead.

How many 404 requests contained "hidebots" in the URL

This function will:

  • Iterate through all the log lines, and using the regular expression I gave you:
    • Check whether the code is 404. If it does:
      • Check whether the request string contains "hidebots" (not including the quotes). You will need a second (much simpler) regular expression for this check.
      • Count how many such lines you found.
  • At the end: print the total count.

Print all IP addresses that caused a 404 response

  • This function is almost the same as the "How many total "Not Found" requests (Code 404)", but instead of counting the number of 404 requests, it will record the IP for that request as a key in a python dictionary. The dictionary will ensure that you don't have any duplicate IP addresses.
  • At the end: print the list of keys in that dictionary.

The log files I gave you have a much larger number of 404 requests than I expected, so this list will be very long.

Submission

After testing your program - submit the check_apache_log_yourmysenecaid.py file via Blackboard. Don't submit another document, nor a screenshot, nor an archive.

Rubric

Item Marks
Submitted correctly /2
Loading log files /2
6 parsing functions /6