Distributed Large-Scale Information Filtering


C. Tryfonopoulos, S. Idreos, M. Koubarakis, and P. Raftopoulou, “Distributed Large-Scale Information Filtering,” Transactions on Large-Scale Data- and Knowledge-Centered Systems XIII Lecture Notes in Computer Science, vol. 13, pp. 91-122, 2014.
tlsdkcs_2014.pdf896 KB


We study the problem of distributed resource sharing in peer-to-peer networks and focus on the problem of information filter- ing. In our setting, subscriptions and publications are specified using an expressive attribute-value representation that supports both the Boolean and Vector Space models. We use an extension of the distributed hash table Chord to organise the nodes and store user subscriptions, and utilise efficient publication protocols that keep the network traffic and latency low at filtering time. To verify our approach, we evaluate the proposed protocols experimentally using thousands of nodes, millions of user sub- scriptions, and two different real-life corpora. We also study three impor- tant facets of the load-balancing problem in such a scenario and present a novel algorithm that manages to distribute the load evenly among the nodes. Our results show that the designed protocols are scalable and efficient: they achieve expressive information filtering functionality with low message traffic and latency.

Last updated on 05/11/2014