Stats for fileset dbpedia

Summary

Triple Count: 231661194
URI Count: 30218224
Average URI length: 52.93, Standard Deviation: 20.45
Average URI reuse: 20.97
Appeared as (ignoring literals):
S only: 1735317
P only: 1101
S and P: 38559
O only: 11794631
O and S: 16648616
P and O: 0
S, P and O: 0
O including literals: 48039661
Literal Count: 36245030
Average literal length: 76.72, Standard Deviation: 282.03
Average literal reuse: 1.69
Blank Node Count: 0
Average Blank Node reuse: 0.00


Detail Navigation

Node appearances as S, P, O, SP, PO, OS
Aggregate node reuse
Node lengths


Node appearances as S, P, O, SP, PO, OS

Graph 1 shows the number of times nodes (or node pairs) of a given cardinality appear. So, if there are 200,000 nodes that appear as a Subject on three occasions, then 200,000 will be plotted at an x-position of 3 on the graph.

Graph 2 is more complex: it shows the cumulative entries to give a more readable graph. In this graph, if we have 100,000 nodes that appear as a Subject only once, and 100,000 nodes that appear as a Subject twice, then we plot points at (x=1,y=100,000), and (x=2,y=300,000). Thus, if a given Subject exists many times relative to the size of the dataset, it will cause a pronounced upward tick in the graph. This second graph is useful for showing the proportion of an index over S (or P, or SP, etc) that will be made up of small entries, vs large ones with repeating elements.

Data Files: S P O SP PO OS

CardinalitySPOSPPOOS
Total18422492396606468827711275322884943262192062986
1-173419812899497414989995831174386735160322033
2-22326425351577733914071109439028925863273
3-38791018522119364212513415139714433129
4-4114337321220101106316131268810301130500
5-5229739901615796862354597213179579
6-631628073546260142814543018394019
7-75924162634141433931932874821026
8-87095255527446428717326123312164
9-9394324802253952333852122852584
10-194909193340107302913349709892974512
20-294770961695372922584914333453105
30-39433028112418472031255516483717
40-49468415862109811175099984827
50-5932610865572882108358657726
60-692235784835124472750468113
70-791597994513780349486347215
CardinalitySPOSPPOOS
80-891158753502815135472265183
90-99895773512203627210205790
100-199272325183195504977319006213
200-299408219492911120774276227
300-39912997593132277053126451
400-49956643927787315373220
500-59930223035033193346020
600-69915983063483106831840
700-799930245242763022560
800-899688156191644118370
900-999469136149932514180
1000-10994381100
1000-199914639096001108157950
2000-2999147383172711317680
3000-399938211875348120
4000-49991012648694600
5000-59991089320102980
CardinalitySPOSPPOOS
6000-699918122811850
7000-799914915411190
8000-899906413701130
9000-9999047810690
10000-19999027335303030
20000-2999901111400940
30000-39999080610450
40000-49999035410200
50000-59999029130140
60000-69999017150180
70000-7999901711050
80000-899990178010
90000-999990710030
100000-199999061240140
200000-2999990132000
300000-399999033040
400000-499999013030
CardinalitySPOSPPOOS
500000-599999073020
600000-699999010000
700000-799999010000
800000-899999020000
1000000-1999999041010
2000000-2999999010000
3000000-3999999020000
4000000-4999999010000
5000000-5999999010000
6000000-6999999010000
7000000-7999999041010
9000000-9999999020000
10000000-19999999010000
90000000-99999999010000


Aggregate Node Reuse

These graphs illustrate the number of times nodes are reused across all elements of a triple. Graph 1 shows the number of nodes that have been reused a given number of times: if 10 nodes appear 100 times, a point will be plotted at (x=100,y=10). Graph 2 is again more complex: if 10 nodes appear 100 times, and 2 nodes appear 101 times, points will be plotted at (x=100,y=1000), and (x=101,y=1202). This aids in visualising what proportion of the dataset is made up of heavily reused nodes vs rarely reused nodes.

Data Files: URI Literal B-Node

#Times reusedURILiteralBlank Node
Total30218224362450300
1-18454355305147320
2-2373819839155830
3-319204719341560
4-44127063229320
5-598104251457470
6-6739064957830
7-7518101507100
8-8253417368050
9-9203877259310
10-198874491045450
20-29553293345780
30-39473485183800
40-49451994106730
50-5939931770450
60-6927684046960
70-7920744133700
#Times reusedURILiteralBlank Node
80-8915733322500
90-9912077818290
100-19944653679280
200-2999285623610
300-3993632411090
400-499187057680
500-599108315810
600-69970754250
700-79948262830
800-89934972160
900-99925911860
1000-10992600
1000-199999106920
2000-299923242440
3000-399911081230
4000-4999642650
5000-5999384400
#Times reusedURILiteralBlank Node
6000-6999302390
7000-7999194260
8000-8999192230
9000-9999113170
10000-19999582620
20000-29999213370
30000-39999124210
40000-4999956170
50000-599993950
60000-699992760
70000-799992440
80000-899992500
90000-999991520
100000-1999998140
200000-2999991410
300000-399999600
400000-499999400
#Times reusedURILiteralBlank Node
500000-5999991000
600000-699999100
700000-799999100
800000-899999200
1000000-1999999500
2000000-2999999100
3000000-3999999200
4000000-4999999100
5000000-5999999100
6000000-6999999100
7000000-7999999500
9000000-9999999200
10000000-19999999100
90000000-99999999100


Node Lengths

These graphs illustrate the length in bytes of nodes. In both cases, even if a Node is reused many times, it is only considered once in these graphs. Graph 1 shows the number of nodes that have a given length: if 10 nodes have a length of 100 bytes, a point will be plotted at (x=100,y=10). Graph 2 is again more complex, plotting the cumulative space used: if there are 10 nodes of length 100 bytes, and 2 nodes of length 110 bytes, points will be plotted at (x=100,y=1000), and (x=110,y=1220). This aids in visualising what proportion of space is taken up by nodes of a given size.

Data Files: URI Literal

Node LengthURILiteral
Total3021822436245029
1-17223
2-218552841
3-3713238834
4-49682125933
5-510573268445
6-69322889580
7-791033650586
8-877366631679
9-968538503520
10-19700466049375
20-296674383127136
30-3944699751493856
40-4911058969533083
50-596582427272552
60-693431096203984
70-791678006128919
Node LengthURILiteral
80-89775803115052
90-9946743495416
100-199910083688230
200-29938465541664
300-3994073679607
400-49910111019534
500-599362201570
600-699276157567
700-799179128169
800-89980103969
900-9995385415
1000-10991778
1000-1999118379846
2000-2999187226
3000-3999023927
4000-499907901
5000-599903145
Node LengthURILiteral
6000-699902407
7000-799901471
8000-89990533
9000-99990326
10000-199990749
20000-299990115
30000-39999034
40000-49999017
50000-5999903
60000-6999906
70000-7999903
90000-9999902
100000-19999901