Extremist Developer

RSS

Extracting infos from Apache access log

In the process of writing a “log watcher” to plug in with Live Graph, i needed to extract all the info from an access log. Assuming we have the standard access log from Apache:

172.16.10.23 - - [29/Oct/2010:10:39:51 -0400] “GET / HTTP/1.1” 200 10890 “-” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7”

So assuming i want to extract: IP, Date, Method, HTTP Code, Bytes sent and Use-agent, here is the Regex that does the trick for me:

(\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b)\s-\s-\s(.*)\s”([A-Z]*)\s\/\s[A-Z]+\/[0-9].[0-9]”\s(\d{1,3})\s([0-9]+)\s”-“\s”(.*)”

Might not be the best Regex to do so, but does the trick pretty good. 

As a side note, i use NodeJS to write the log watcher, and i gotta admit, it’s pretty amazing. Gonna post soon about it.