Finish GC-Lists, trec_eval, Start Index Compression




CS267

Chris Pollett

Apr 12, 2021

Outline

Properties of GC-lists

Operators

Examples

Implementation

More Implementation

The books definition of the four binary operators

trec_eval

trec_eval command-line arguments

trec_rel_file and trec_top_file

Example trec_rel_file

351 0 FR940104-0-00001 0
351 0 FR940104-0-00002 1
351 0 FR940104-0-00003 1
351 0 FR940104-0-00004 1
351 0 FR940104-0-00005 1
351 0 FR940104-0-00006 1
351 0 FR940104-0-00007 1
351 0 FR940104-0-00008 1
351 0 FR940104-0-00009 1
351 0 FR940104-0-00010 1
351 0 FR940104-0-00011 0
351 0 FR940104-0-00012 1

Example trec_top_file

351   0  FR940104-0-00001  1   102.38   run-nam2
351   0  FR940104-0-00002  1   101.38   run-nam2
351   0  FR940104-0-00003  1   91.38   run-nam2
351   0  FR940104-0-00004  1   81.38   run-nam2
351   0  FR940104-0-00005  1   71.38   run-nam2
351   0  FR940104-0-00006  1   61.38   run-nam2
351   0  FR940104-0-00007  1   51.38   run-nam2
351   0  FR940104-0-00008  1   41.38   run-nam2
351   0  FR940104-0-00009  1   31.38   run-nam2
351   0  FR940104-0-00010  1   22.38   run-nam2
351   0  FR940104-0-00011  1   21.38   run-nam2
351   0  FR940104-0-00012  1   11.38   run-nam2

Example trec_eval Output

runid                 	all	run-nam2
num_q                 	all	1
num_ret               	all	12
num_rel               	all	10
num_rel_ret           	all	10
map                   	all	0.7904
gm_map                	all	0.7904
Rprec                 	all	0.9000
bpref                 	all	0.4500
recip_rank            	all	0.5000
iprec_at_recall_0.00  	all	0.9000
iprec_at_recall_0.10  	all	0.9000
iprec_at_recall_0.20  	all	0.9000
iprec_at_recall_0.30  	all	0.9000
iprec_at_recall_0.40  	all	0.9000
iprec_at_recall_0.50  	all	0.9000
iprec_at_recall_0.60  	all	0.9000
iprec_at_recall_0.70  	all	0.9000
iprec_at_recall_0.80  	all	0.9000
iprec_at_recall_0.90  	all	0.9000
iprec_at_recall_1.00  	all	0.8333
P_5                   	all	0.8000
P_10                  	all	0.9000
P_15                  	all	0.6667
P_20                  	all	0.5000
P_30                  	all	0.3333
P_100                 	all	0.1000
P_200                 	all	0.0500
P_500                 	all	0.0200
P_1000                	all	0.0100

Quiz

Which of the following is true?

  1. A GC-list is another name for a cover.
  2. Accumulator pruning might be used when computing disjunctive query results in the document-at-a-time setting.
  3. The MaxScore heuristic is used to speedup the computation of disjunctive queries when BM25 is being used.

Index Compression

General-Purpose Data Compression

Symbolwise Data Compression

Modeling and Coding

Compression Models and Codes

`gamma`-codes

More on Prefix Codes