Scaling dynamic authority-based search using materialized subgraphs .. For example, on the full Wikipedia dataset, BinRank can answer any query in less. BINRANK: SCALING DYNAMIC AUTHORITYBASED SEARCH USING The idea of approximating ObjectRank by using Materialized subgraphs (MSGs), which. Effective Bin Rank for Scaling Dynamic Authority. Based Search with Materialized Sub Graphs. L. Prasanna Kumar. Abstract. Dynamic authority-based keyword.

Author: Zura Faubar
Country: Bangladesh
Language: English (Spanish)
Genre: Travel
Published (Last): 6 April 2014
Pages: 484
PDF File Size: 13.56 Mb
ePub File Size: 1.79 Mb
ISBN: 655-8-72366-998-6
Downloads: 80168
Price: Free* [*Free Regsitration Required]
Uploader: Gulrajas

A greedy bin algorithm unit 20 using the above-discussed bin construction process, packTermsIntoBins partitions W into a set of bins composed of frequently co-occurring terms.

In this format, the entire Wikipedia graph consumes MB of storage, and can be loaded into main memory for MSG generation. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. PageRank algorithm utilizes the Web graph link structure to assign global importance to Web dhnamic. Communications interface allows software and data to be transferred between the computer system and external atuhority-based.

BinRank: Scaling Dynamic Authority Based Search Using Materialized Sub Graphs

Furthermore, it can be observed that a single sub-graph can serve as an approximate RSG for a number of terms, and also that it is quite feasible to construct a relatively small number of such sub-graphs that collectively cover, i. A particular one of the pre-computed materialized sub-graphs is accessed and a dynamic authority-based keyword search is executed on the particular one of the pre-computed materialized sub-graphs.

This relationship gives us the following important result. In the off-line mode, ObjectRank precomputes top-k authoritty-based for a query workload in advance. In addition, embodiments of the invention use a greedy algorithm that minimizes the number of bins by clustering terms with similar posting lists.

However, notice that edges of different edge types may transfer different amounts of authority. In general, deserialization speed can be greatly improved by increasing the transfer rate of the disk subsystem.

System and methodology for generating bushy trees using a left-deep tree join enumeration algorithm. Any combination of one or more computer usable usnig computer readable medium s may be utilized. Thus, the upper bound on the number of intersections is tight.


For example, on the full Wikipedia dataset, BinRank can answer any query in less than one second, by precomputing about a thousand sub-graphs, which takes only about 12 hours on a single CPU.

Ideally, every object that receives a non-zero score during the ObjectRank computation over the full graph should be present in the sub-graph and should receive the same score. However, to get to that situation, the bin computation process will have to check intersections for every pair of terms.

A method according to claim 8 wherein said dynamic random walk is the ObjectRank algorithm. The problem of minimizing the number of bins is NP-hard. It can be demonstrated that it is feasible to use the entire dataset dictionary as the workload, in order to be able to answer any query.

This computation is too expensive for large graphs and not feasible at query time. A pair is e-tie, if R E does not order the nodes of the pair, and a-tie, if R A does not order them.

The above-discussed Personalized Page Rank and ObjectRank algorithms both suffer from scalability issues. Papers about XML tend to cite papers that talk about schemas and vice versa. Once the MSG is constructed and stored in MSG storage 26it is serialized to a binary file on disk in the same row-compressed adjacency matrix format to facilitate fast deserialization.

This way, more authkrity-based can be kept in RAM, thus decreasing the average query execution time. The PageRank score is independent of a keyword query. A method according to claim 2 wherein said authority-based keyword search is an ObjectRank algorithm.

BinRank: Scaling Dynamic Authority Based Search Using Materialized Sub Graphs – AngelList

In block 42 materialized sub-graphs are pre-computed. A xynamic according to claim 8 wherein said generating pre-computed materialized sub-graphs comprises: Thus, scores below threshold are effectively indistinguishable from zero, and objects that have such scores are not at all relevant to the query term.

In particular, the computer programs, when executed, enable the processor to perform the features of the computer system. The mapping of terms to bins is remembered, and at query time, the corresponding bin for each term can be uniquely identified, and the term can be executed on the MSG of this bin.

As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product.


BinRank: Scaling Dynamic Authority-Based Search Using Materialized Subgraphs – Semantic Scholar

The tight upper bound on the number of set intersections that the algorithm needs to perform is the number of pairs of terms that co-occur in at least one document. In fact, the inventors have discovered that terms with strong semantic connections can generate good RSGs for each other. According to a further embodiment of the present invention, a system comprises: Ddynamic partition identifier is stored for each term, in block Empirical results support this.

For example, on the same Wikipedia dataset, the full dictionary precomputation would take about authroity-based CPU-year.

To speed-up the execution of set intersections for larger posting lists, KMV synopses may be used to estimate the size of set intersections. For example, consider N terms with posting lists of size X each, that all co-occur in one document d 0 with no other co-occurrences.

For 2we execute ObjectRank for each bin using the terms in the bins as random walk starting points and keep only those nodes that receive non-negligible scores. From the above description, it can be amterialized that the present invention provides a system, computer program product, and method for implementing the mategialized of ninrank invention. However, there is definitely a strong semantic connection between these terms, since XML is a data format famous for its flexible schema.

Dynamic, authority-based search algorithms, leverage semantic link information to provide high quality, high recall search results.

The computer system can include a display interface that forwards graphics, text, and other data from the communication infrastructure or from a frame buffer not shown for display on a display unit The method includes generating a set of pre-computed materialized sub-graphs from a dataset and receiving a search query having one or more search query terms.

In fact, if all posting lists are disjoint, this problem reduces to a classical NP-hard bin packing problem. In order to save pre-processing cost and storage, each MSG is designed to answer multiple term queries.