OPS445 Assignment 1 Part 2
Assignment 1 - Parsing a log file
Part 2 - Parser
Weight: 5% of the overall grade.
Due Date: Ask your professor for exact date.
Late penalty: 10% per day, and note the assignment must be completed satisfactorily in order to pass the course no matter what grade you get.
Overview
In part 1 of this assignment you created a program with command-line menu-based navigation that didn't do anything.
In this part 2 you will add functionality to all your menu items.
Requirements
- You may use any coding style you choose - but make sure it is consistent throughout the program. Coding style is important - not least because I have to read it. "Style" includes adding a comment at the top of every function explaining what it does. You won't lose marks for too many comments, you will lose marks for too few.
- If your assignemnt doesn't work or has major problems - you will be asked to resubmit it, which will likely cost you more than a late penalty.
Header
Your program must be a single source file named check_apache_log_yourmysenecaid.py, and at the top of that file it will contain the following comment:
# OPS435 Assignment 1
# check_apache_log_yourmysenecaid.py
# Author: Your Name
Loading the logs
When your program starts: don't just print "Loading filename..." but actually read those files, and store every line from all those files in a global list all_log_lines.
Look into how to access a global variable from inside a function, you will need it for every function you implement for this part of the assignment.
You can use these logs for testing (the examples are based on these log files).
Regular expression
All of the functions for this part of the assignment will be easier to complete if you use a regular expression. You can use this code on each line you're evaluating:
match = re.match('([(\\d\\.)]+) - - \\[(.*?)\\] "(.*?)" (\\d+) ((\\d+)|-).*', line)
if match != None:
ip = match.group(1)
code = match.group(4)
request = match.group(3)
How many total requests (Code 200)
This function will:
- Iterate through all the log lines, and using the regular expression I gave you:
- Check whether the code is 200.
- Count how many lines with code 200 you found.
- At the end: print the total count.
How many requests from Seneca (IPs starting with 142.204)
This function will:
- Iterate through all the log lines, and using the regular expression I gave you:
- Check whether the ip address begins with 142.204. You can use the python startswith() method for this.
- Count how many such lines you found.
- At the end: print the total count.
How many requests for OPS435_Lab
This function will:
- Iterate through all the log lines, and using the regular expression I gave you:
- Check whether the request string contains "OPS435_Lab" (not including the quotes). You will need a second (much simpler) regular expression for this check.
- Count how many such lines you found.
- At the end: print the total count.
How many total "Not Found" requests (Code 404)
This function is the same as the "How many total requests (Code 200)" function, but looks for code 404 instead.
How many 404 requests contained "hidebots" in the URL
This function will:
- Iterate through all the log lines, and using the regular expression I gave you:
- Check whether the code is 404. If it does:
- Check whether the request string contains "hidebots" (not including the quotes). You will need a second (much simpler) regular expression for this check.
- Count how many such lines you found.
- Check whether the code is 404. If it does:
- At the end: print the total count.
Print all IP addresses that caused a 404 response
- This function is almost the same as the "How many total "Not Found" requests (Code 404)", but instead of counting the number of 404 requests, it will record the IP for that request as a key in a python dictionary. The dictionary will ensure that you don't have any duplicate IP addresses.
- At the end: print the list of keys in that dictionary.
The log files I gave you have a much larger number of 404 requests than I expected, so this list will be very long.
Submission
After testing your program - submit the check_apache_log_yourmysenecaid.py file via Blackboard. Don't submit another document, nor a screenshot, nor an archive.
Rubric
Item | Marks |
---|---|
Submitted correctly | /2 |
Loading log files | /2 |
6 parsing functions | /6 |