trec_eval, Index Compression




CS267

Chris Pollett

Oct 21, 2019

Outline

trec_eval

trec_eval command-line arguments

trec_rel_file and trec_top_file

Example trec_rel_file

351 0 FR940104-0-00001 0
351 0 FR940104-0-00002 1
351 0 FR940104-0-00003 1
351 0 FR940104-0-00004 1
351 0 FR940104-0-00005 1
351 0 FR940104-0-00006 1
351 0 FR940104-0-00007 1
351 0 FR940104-0-00008 1
351 0 FR940104-0-00009 1
351 0 FR940104-0-00010 1
351 0 FR940104-0-00011 0
351 0 FR940104-0-00012 1

Example trec_top_file

351   0  FR940104-0-00001  1   102.38   run-nam2
351   0  FR940104-0-00002  1   101.38   run-nam2
351   0  FR940104-0-00003  1   91.38   run-nam2
351   0  FR940104-0-00004  1   81.38   run-nam2
351   0  FR940104-0-00005  1   71.38   run-nam2
351   0  FR940104-0-00006  1   61.38   run-nam2
351   0  FR940104-0-00007  1   51.38   run-nam2
351   0  FR940104-0-00008  1   41.38   run-nam2
351   0  FR940104-0-00009  1   31.38   run-nam2
351   0  FR940104-0-00010  1   22.38   run-nam2
351   0  FR940104-0-00011  1   21.38   run-nam2
351   0  FR940104-0-00012  1   11.38   run-nam2

Example trec_eval Output

runid                 	all	run-nam2
num_q                 	all	1
num_ret               	all	12
num_rel               	all	10
num_rel_ret           	all	10
map                   	all	0.7904
gm_map                	all	0.7904
Rprec                 	all	0.9000
bpref                 	all	0.4500
recip_rank            	all	0.5000
iprec_at_recall_0.00  	all	0.9000
iprec_at_recall_0.10  	all	0.9000
iprec_at_recall_0.20  	all	0.9000
iprec_at_recall_0.30  	all	0.9000
iprec_at_recall_0.40  	all	0.9000
iprec_at_recall_0.50  	all	0.9000
iprec_at_recall_0.60  	all	0.9000
iprec_at_recall_0.70  	all	0.9000
iprec_at_recall_0.80  	all	0.9000
iprec_at_recall_0.90  	all	0.9000
iprec_at_recall_1.00  	all	0.8333
P_5                   	all	0.8000
P_10                  	all	0.9000
P_15                  	all	0.6667
P_20                  	all	0.5000
P_30                  	all	0.3333
P_100                 	all	0.1000
P_200                 	all	0.0500
P_500                 	all	0.0200
P_1000                	all	0.0100

Quiz

Which of the following is true?

  1. The MaxScore heuristic is used to speedup the computation of disjunctive queries when BM25 is being used.
  2. Accumulator pruning might be used when computing disjunctive query results in the document-at-a-time setting.
  3. A GC-list is another name for a cover.

Index Compression

General-Purpose Data Compression

Symbolwise Data Compression

Modeling and Coding

Compression Models and Codes

`gamma`-codes

More on Prefix Codes

Making an optimal code tree

Making an optimal code tree