Some of our log files are huge - I have a 10 GB file on my HDD right now. These files are unwieldy to say the least, and you usually have a good idea as to the time in which a particular problem occurred, so it's often handy to be able to chop out a specified time range. I keep forgetting how to do this, and re-inventing the process. So, here for my own benefit at a couple of methods.
The hard way
From the bash shell:
wc -l your.log
This will count the lines in your file.
grep -n 12:00: your.log | head -n 1
This will give you the line number of the 1st line containing "12:00:" - in this example, I want log entries starting at midday, so this is the first line that I want.
grep -n 12:10: your.log | tail -n 1
This will give you the line number of the 1st line containing "12:10:" - in this example, I want log entries up to ten past twelve, so this is the last line that I want.
tail -n $[total-first] your.log | head -n $[total-last] > your_focused.log
Replace first, last and total with the values you got above, and you'll end up with a file containing only the time range that you wanted. (If you only want to look at the file once, you can just pipe into less or whatever rather than piping into an output file.)
The easy way
python -c "import itertools, sys; sys.stdout.writelines(itertools.takewhile(lambda item: not '12:10:' in item, itertools.dropwhile(lambda item: not '12:00:' in item, open('your.log'))))" > your_focused.log
Same thing, only this will read through the file just once.
Now, I'm fully expecting someone to come and tell me the real easy way. ;-)
Posted to Linux by Simon Brunning at February 18, 2008 12:25 PMIf all the lines would have had a timestamp this would do:
perl -n -e "print if /^(12:0[0-9]:.*)$/" your.log > your_focused.log
Posted by: Jonas on February 18, 2008 01:24 PMJonas beat me to the spirit of it, but I was going to say:
grep '^12:0[0-9]:' your.log > your_focused.log
Add a "-An" flag to grep if you have (up to "n") intervening lines that don't match the regexp.
Posted by: James on February 18, 2008 01:33 PMCool - but *not* all the lines in the time range contain a timestamp - we have error tracebacks, too, and I don't want to lose them! ;-)
Posted by: Simon on February 18, 2008 01:35 PMOr using sed (if not all lines are timestamped):
sed -n -e '/12:0[0-9]/,/12:1[0-9]:/ p' your.log > your_focued.log
(But it will fail if there isn't a line matching 12:1[0-9])
(And using grep and -An like James proposes is quite convenient as well)
Posted by: Jonas on February 18, 2008 01:40 PMOooooh - I like the sed approach.
Posted by: Simon on February 18, 2008 01:54 PMWell sed is one way, but if your log file is anything like mine, awk is a viable alternative. You can set up the data pick requirements as awk treats the elements in the log file as fields. set your criteria then in the output you can format the data selected however you want.
AWK is the poor man's report generator. If you can scarf a copy of NAWK it will even do the reporting live off the stream.
Posted by: JohnMc on February 18, 2008 03:07 PM@see http://www.vanheusden.com/multitail/
FTW!!!
Posted by: Örjan Lundberg on February 18, 2008 03:28 PMJonas, the sed trick is sweet! I've never used the starting and ending address technique before.
I thought of awk, too:
awk '/12:0[0-9]:/ {on=1} /12:1[0-9]:/ {exit} {if on {print $0}}' your.log > your_focused.log
...but it's clearly not as elegant as Jonas' approach unless you want to start picking out individual records, like JohnMc mentions, which appears to be beyond the problem statement.
Posted by: James on February 18, 2008 03:46 PMI have two methods for dealing with this problem.
The first is that I'll just open the file in Vim (which is great with huge files, as long as there are no particularly huge *lines*). Then, starting from the beginning of the file, I'll do "d/12:00", then use "/12:11" to find the first line I want to delete, then "dG" to delete the rest. What's left is what I want.
Of course, that doesn't work if you aren't using vim on the same computer as the log file; in that case I tend to just write a one-off python script.
f = file("bitofblah.log")
write = false
for line in file("blah.log"):
if line.startswith("12:0"): write = true
if line.startswith("12:1"): write = false
if write:
f.write(line)
I think the sed solution can be slightly simplified to this:
sed -n -e "/12:00/,/12:10/p" your.log >wanted.log
Just tried this (caveat: with only one test file/case, though), input and output below:
Contents of your.log:
11:00 Unwanted line
11:10 Unwanted line
12:00 Wanted line 1
12:05 Wanted line 2
Exception traceback - Wanted line 3
12:08 Wanted line 4
Exception traceback - Wanted line 5
12:09 Wanted line 6
12:10 Wanted line 7
12:11 Unwanted line
12:13 Unwanted line
Contents of wanted.log:
12:00 Wanted line 1
12:05 Wanted line 2
Exception traceback - Wanted line 3
12:08 Wanted line 4
Exception traceback - Wanted line 5
12:09 Wanted line 6
12:10 Wanted line 7
Seems to get all the right lines including the traceback.
Note: I'm not at a Linux/UNIX machine right now, did this on Windows. Downloaded GNUWin32 sed (Google for "sed for Windows") to try it out.
Have to replace the single quotes used on *nix with double quotes because of Windows restrictions.
Also, Jonas, I think your regex will get lines through 12:19.
- Vasudev Ram
www.dancingbison.com
Turns out I don't need the regexes in this case, since log entries are frequent enough that I can guarantee that there won't be a hole anything like as long as a minute. How do you think the log files got so big? ;-)
But I do need to keep the trailing colons; "12:00" would match "00:12:00", whereas "12:00:" won't match until midday. So, I can use:
sed -n -e "/12:00:/,/12:10:/p" your.log > wanted.log
Slightly simpler is:
sed -e "/12:00/,/12:10/!d" your.log >wanted2.log
Here the "-n" is dropped (-n option means don't use the sed default of printing all input lines, so dropping it means DO print all lines) and the "!" means inversion, i.e., perform the following action (the d, for delete), on all lines NOT matching the regex :)
The simplicity is not about saving characters typed (we only save two or so), rather its a little simpler conceptually ...
And simpler still:
sed "/12:00/,/12:10/!d" your.log >wanted3.log
Because the -e option (e for expression), which is followed by sed regexes and commands in quotes), is really only needed, IIRC, if you want to pass multiple expressions on the sed command line.
- Vasudev
Check this: getting filtered logs from remote machines.
ssh user@server "sed -n -e '/13:00:/,/13:10:/p' /path/to/wanted.log" > out.log
Posted by: Simon on April 14, 2008 03:22 PM