Stats for fileset BSBM_large

Summary

Triple Count: 105316107
URI Count: 15521808
Average URI length: 88.60, Standard Deviation: 2.39
Average URI reuse: 16.84
Appeared as (ignoring literals):
S only: 9000131
P only: 40
S and P: 0
O only: 6009244
O and S: 512393
P and O: 0
S, P and O: 0
O including literals: 15176146
Literal Count: 9166902
Average literal length: 480.92, Standard Deviation: 602.00
Average literal reuse: 5.94
Blank Node Count: 0
Average Blank Node reuse: 0.00


Detail Navigation

Node appearances as S, P, O, SP, PO, OS
Aggregate node reuse
Node lengths


Node appearances as S, P, O, SP, PO, OS

Graph 1 shows the number of times nodes (or node pairs) of a given cardinality appear. So, if there are 200,000 nodes that appear as a Subject on three occasions, then 200,000 will be plotted at an x-position of 3 on the graph.

Graph 2 is more complex: it shows the cumulative entries to give a more readable graph. In this graph, if we have 100,000 nodes that appear as a Subject only once, and 100,000 nodes that appear as a Subject twice, then we plot points at (x=1,y=100,000), and (x=2,y=300,000). Thus, if a given Subject exists many times relative to the size of the dataset, it will cause a pronounced upward tick in the graph. This second graph is useful for showing the proportion of an index over S (or P, or SP, etc) that will be made up of small entries, vs large ones with repeating elements.

Data Files: S P O SP PO OS

CardinalitySPOSPPOOS
Total95125244015688539982870721600563598171864
1-10014108025976870721411743891067436
2-2008993211071667065345
3-300113309113516938351
4-4001447209168667732
5-5478850167569281919240
6-615571301670483000991903130
7-7892601449312421663300
8-82429801123475951329680
9-92267740791561252984680
10-19874892802186801537403713000
20-29491701236021394411757200
30-391630940698434586751280
40-491275870649816154450
50-594397034901074870
60-69508460062320
70-79004568054470
CardinalitySPOSPPOOS
80-89005051053460
90-99004995049000
100-19900164740167580
200-299001669034290
300-3990080506300
400-4990043002750
500-5990032902220
600-6990030302010
700-7990039902960
800-8990028701810
900-9990033802010
1000-1099006010
1000-1999001334034780
2000-299901531026380
3000-39990091304480
4000-4999008510640
5000-5999005550680
CardinalitySPOSPPOOS
6000-6999002150660
7000-799900780500
8000-89990123204190
9000-999900620640
10000-199990025402830
20000-2999900310590
30000-3999900300580
40000-4999900330620
50000-5999901290570
60000-6999901320300
70000-79999001450560
80000-8999900290290
90000-99999008080
100000-199999075060
200000-2999990000400
300000-399999092030
400000-499999000010
CardinalitySPOSPPOOS
800000-899999003000
900000-999999001010
1000000-1999999004030
2000000-2999999042000
3000000-3999999051010
5000000-5999999010000
6000000-6999999071010
9000000-9999999020000
10000000-19999999010000


Aggregate Node Reuse

These graphs illustrate the number of times nodes are reused across all elements of a triple. Graph 1 shows the number of nodes that have been reused a given number of times: if 10 nodes appear 100 times, a point will be plotted at (x=100,y=10). Graph 2 is again more complex: if 10 nodes appear 100 times, and 2 nodes appear 101 times, points will be plotted at (x=100,y=1000), and (x=101,y=1202). This aids in visualising what proportion of the dataset is made up of heavily reused nodes vs rarely reused nodes.

Data Files: URI Literal B-Node

#Times reusedURILiteralBlank Node
Total1552180891669020
1-1600892680988110
2-20892790
3-301119530
4-401422670
5-501636210
6-61171613670
7-72091376600
8-8245881036960
9-9227227693970
10-198777383836130
20-298720650
30-394721210
40-494320700
50-596423810
60-695755300
70-796115100
#Times reusedURILiteralBlank Node
80-895964810
90-993319630
100-1992228024600
200-29918472410
300-3996031940
400-4992811790
500-5991691570
600-6991441550
700-7992391640
800-8991021890
900-999912480
1000-1099150
1000-19998514870
2000-2999495330
3000-3999871310
4000-4999832270
5000-5999534240
#Times reusedURILiteralBlank Node
6000-6999196210
7000-799958220
8000-8999411920
9000-999944180
10000-19999214400
20000-299993280
30000-399992280
40000-499995280
50000-599991290
60000-699992310
70000-7999901450
80000-899990290
90000-99999080
100000-199999840
300000-3999991100
800000-899999030
900000-999999010
#Times reusedURILiteralBlank Node
1000000-1999999040
2000000-2999999420
3000000-3999999600
5000000-5999999100
6000000-6999999800
9000000-9999999200
10000000-19999999100


Node Lengths

These graphs illustrate the length in bytes of nodes. In both cases, even if a Node is reused many times, it is only considered once in these graphs. Graph 1 shows the number of nodes that have a given length: if 10 nodes have a length of 100 bytes, a point will be plotted at (x=100,y=10). Graph 2 is again more complex, plotting the cumulative space used: if there are 10 nodes of length 100 bytes, and 2 nodes of length 110 bytes, points will be plotted at (x=100,y=1000), and (x=110,y=1220). This aids in visualising what proportion of space is taken up by nodes of a given size.

Data Files: URI Literal

Node LengthURILiteral
Total155218089166902
1-109
2-20158
3-301313
4-403454
5-5014456
6-6098540
7-70910216
8-8014058
9-9012487
10-190162114
20-298926186390
30-399414070
40-4915394886
50-590354259
60-6927354039
70-7949899353692
Node LengthURILiteral
80-8912157805353760
90-993305127353651
100-19901828516
200-299019083
300-399019185
400-499054859
500-5990227806
600-6990232180
700-7990232177
800-8990232519
900-9990233272
1000-109902329
1000-199902095357
2000-299908067