|
|
Archives of the TeradataForum
Message Posted: Wed, 04 Aug 2004 @ 07:22:26 GMT
Subj: | | Re: Sampling of Pages |
|
From: | | Stephan Ewen |
Hi,
I am not quite sure about the terminology, maybe pages is the wrong term, I am quite new to teradata, I come from a DB2 background... With
pages I meant a unit that is read from the disk in one I/O operation and usually maps to one sector on the disk. Large tables naturally fill many
pages and to get a statistically valid sample, the sampling would have to consider more or less all pages, since data might be clustered on the
pages, that is why obtaining samples that should be statistically valid (i.e. robust to chi-square and spectral tests) is not so very performant.
A great reductuion of I/Os is possible if you sample the pages, but the sample obtained is not always representative. Does this correspond to
proportional sampling ? With the AMPs it sounds like query parallelism to me, rather than system or physical level.
Please correct me, if I am wrong, as I mentioned, I am new ;)
Thanks,
Stephan
| |