DCSP project - Central First Peripheral After ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CFPA - Report document ********************* Students: Reuven Yagel 028642148 Daniel lishha 029375284 Dept. of Mathematics and Computer Science Ben-Gurion University of the Negev yagel,lishha@cs.bgu.ac.il supervisor: gadi solotorvsky 1. Introduction 2. CFPA algorithm 3. Implementation 4. Protocol Appendix A. FC-CBJ algorithm Appendix B. Running the project Appendix C. Documentation 1. Introduction CSP are constrains satisfaction problems that concern assignment of values according to some constraints. CSP are used in many fields, and of course algorithms are already exists. Our project is concern with DCSP, i.e. Distributed CSP. The idea behind DCSP is that CSP are sometimes used to solve problems that are naturally partitioned into several components, with constrains between the parts. For example resource allocation of some industry divided into several units, or a university divided into departments. In those case we like to preserve the natural structure of the problem, and there might be also a requirement for local autonomy of each component, or hiding information one from another. Thus the distributed structure of this project occupy not dealing only with basic algorithms development, but rather in problems of networking and communication. This project gathered a few groups of students. Some groups dealt with graphical user interface and generating problems, those problems can be explicit or implicit, mainly for testing, and are composed by the users with probabilities. At the far end of the project, came the statistic part, which supposed to test and print the results, and statistics about the performance of the search. The main part is in the middle, that is the search part, this part gets the problem, solve it and passes statistics and results. This part is dealing with implementing one of several algorithms proposed for DCSP's. The communication between the agents of a solution process is also implemented by one of the groups. Our part in the whole project is CFPA: CFPA is an algorithm which one of the agents, called central, is performing. In general, its task is to work on the external problem, which is the constraints, between the agents who suppose together to solve the general problem, by each solving his problem and the external constraint solved by the CFPA algorithm. We use it when the external problem is supposed to be more difficult then the inner problems of each agent. The problem is given as a canonical problem, to be explained later, which we will call the external problem. The central agent tries to solve the external problem with a CSP algorithm. Then the other agents are asked to check if the current assignment fit with their own problems, and feed back to the central agent, the process will continue until a solution will be found or a no-solution will be reported. Our work had two phases: The first: implementing our project as a separate unit. This means generating temporary problems and working on it, in addition working only with our data structures, and implementing the different agents as parallel threads of one program, the communication is also temporary since, in the gathered project, each agent can be execute on a separate machine. Second: combining the whole units together. So, Our part contained: 1. Each agent receives an internal problem. 2. Execute the CFPA algorithm, using FC-CBJ algorithm for each problem. 3. Return a solution to the problem, and statistics about the solution process. 4.the whole thing is written in Java, so it can be used across the Internet. Goals of the project: 1. Learn about CSP and DCSP. 2. Learn about Distributed systems and Java. 3. Learn to work in a team on a big task. 2. CFPA algorithm CFPA algorithm is working on a canonical representation of a problem, it means that the external constrains are gathered to form a new problem, in which every variable represents an external variable in a local problem. This variables are connected with identity constrains to the local problems. We want to use CFPA when the external problem is relatively difficult than the inner problems of each agent. The central agent tries to solve the external problem with a CSP algorithm, then the other agents are asked to check if the current assignments, due to the found solution, fit with their own problem. CFPA is tractable, since it enables distributed work on the whole problem. This is with a cost of relatively small amount of messages, since messages are sent only at the stage that an external solution was found, and only if one of the agents failed in the previous stage to find an assignment. But, there can be waste of time, since agents may be looking for solution that don't much other agent. Performance: In measurements that were done, it was found that parallel algorithms like CFPA, are better than sequential ones, but it is hard to find problems that good for CFPA, i.e. problems in which the external problem is difficult. In the implementation part we will examine heuristics that we'll improve the implementation. We provide statistics by means of number of consistency checks and message sent. First we will present now a general description of the algorithm. It uses a few functions that are declared and explained in the article (DCSP A Model & Application) p13-15. This algorithm is working as follows: 1. Find a solution for external problem Gm(solve_internal). 2. Send the assignments to agents who should know about it(propagate_external). 3. All agents are trying to find a proper solution according to the assignments. 4. If a solution was found, then the whole DCSP has a solution. Otherwise: recursively find a new solution for Gm and propagate it until no solution to Gm is found, return no solution to the DCSP. Pseudo-code of this: cfpa() begin solve_internal(Gm) propagate_external(Gm) if (||solve_internal(G1) and solve_internal(G2) and solve_internal(Gm-1)) return true else return backtrack_cfpa() end backtrack_cfpa() begin if (external_conflict_backtrack(Gm)) begin propagate_external(Gm) if (||solve_internal(G1) and solve_internal(G2) and solve_internal(Gm-1)) return true else return backtrack_cfpa() end else return false end In order to implement the algorithm in a way that it will be efficient, we should try to reduce the number of consistency checks and message passing. This leads us to the next section. 3. Implementation The implementation includes: a. Implement in Java the algorithms FC-CBJ and CFPA. (FC-CBJ is explained in App. A) b. Connection between agents, as the algorithm require. c. Input and output, or in other word the interface between our task and other groups. d. Using a few heuristics, to make the algorithm efficient and realistic for solving real problems. a. Implementation in Java: This part concluded: a) Representation of a problem, i.e. data structures to hold problems. It happened that other groups worked with different data structures, so there are convertors between them. b) Writing the algorithm methods in Java. The main data structures deal with representing the problem, thus we have the following classes: Variable - representing a variable. Domain Constraint Network - raps the whole problem together. Solution - a list of variables and assignments. The main classes represent the agents and solving algorithms. There is a class named 'Solver', from which all the other inherit the main members and methods involved in the solving process. Mainly it has an abstract class called 'Solve', which is the main of this class, and its job is generating the solution. Class CFPA extends Solver, and represents an agent, it's tasks are: getting the problem, decoding it into the data structures, start the solving algorithm, connect with other agents i.e. getting requests for solution and statistics and answer them. Classes FC_CBJSolver and FC_CBJ_CFPASolver also extends Solver, and are finding solutions to problems. They have a member class called SolutionManager which is responsible for keeping solution and statistics. More details will follow in the documentation. b. Connection between agents Message passing (between agents) is made for each agent through its member class MailBox. The communication group provides this class, but during development of the project, We used a semi-class with the same name and identical methods. The difference that our communication is working between threads, while in the real life it works between computers. This class has three main methods, used by us: SendMessage(dest,msg) - which sends a message to a destination agent, msg is a StringBuffer. For internal use there is a class 'Message' which includes an integer action code and a String for data if necessary. Operation codes are represented as final integers, and are the follow: PROPOGATE_EXTERNAL - central agent asks for external solutions. EXTERNAL_FOUND - external agent says it has such a solution. EXTERNAL_NOT_FOUND - it hasn't a solution. TERMINATE - stop the solving process. GetMessage() - returnees a message from mailbox, the process get idle until a message will arrive. Around those method there are more specific methods, dealing with sending messages to all the agents, or sending statistics. The communication part is implemented once by us for the developing phase o our part, and then it uses an identical part which is made by other group. c. Input and output In general, our part is a function that is activated by the communication part, and gets it's input as parameters, specific details in the protocol part. d. Heuristics: We used a few heuristics, to make the algorithm efficient and realistic for solving real problems. For testing purposes, the heuristic of variable ordering can be omitted. a). Variable ordering according to domain size. Immediately after agent gets it's network of the problem it rearranges it's variable's according to their size of domain, the ones with small domain will be first, since they are placed higher in the search tree, and a small domain means less branches of the tree. Also a variable with small domain size is more 'difficult', so there is a greater chance it won't get the right assignment. Thus it is better to start with him, and in a case of backjump we are steel near the root of the search tree. This heuristic is good also for regular CSP, and in tests we've done we found it lowers a lot the number of consistency checks. b). Variable ordering, according to external property. We would like, in each external problem, that the search will be first according to external variables, since the central agent is concerned only with them. We combine this feature, with the previous heuristic, by giving each external variables an extra weight, so in the sorting process, all external variables will be placed first. We used Shell Sort, in the implementation. c). Directed backjumping. FC-CBJ after finding assignments to all external variables should not find several assignments to other variables, since we focus on the external variable solution. This is achieved by testing every backjump, if it is out of the external variables, so the backjump is changed to jump more backward to the last external solution, to find a new assignment, which is different in the external variables. d). Reducing number of messages. The central agent, saves external solution that were sent to it, in a hash table, so before asking a peripheral agent for a solution, it check if it already got this solution, thus avoiding sending request that was already satisfied. e). Skipping of external solution. The central agent also keep track of external assignments that were sent and failed, thus before sending any message, it checks(also in a hash table) if one of the agents can't do with this assignment, so the whole solution is regretted, thus saving unnecessary search of other agents, and many messages. f). Agents don't waste time. In CFPA, each peripheral agent, is waiting for request to fulfill some external variables assignments, in the meantime it searches for solutions and saves them in a hash table according to external assignments. so, when a request for solution comes from the central agent, it first checks if such a solution was found already, if yes sends it immediately, otherwise search for appropriate solution. 4. Protocol In this section we provide the protocol between other group and us. This will be divided into input and output: Input, is concern with the communication group: a. Each agent will have mailbox for communication with the interface who generates the problems, and between agents. The communication module defines mailbox. b. Each agent will get his problem, in a data structure Agent, built by the communication module. c. Each agent will be activated with heuristic code, as to which heuristics to perform, if no code is passed then activate all heuristics. The codes are final integers, and are the follow: public final int WITH_HEURISTICS = 0; public final int WITHOUT_VARIABLE_ORDERING = 1; public final int CHECKING = 99; checking means activating of one agent with a complete problem for testing the solution generated by the CFPA algorithm. d. The CFPA algorithm will be activated as a method, according to the previous paragraphs, as the following: First construct a CFPA agent, by the constructor: public CFPA(MailBox mbox, Agent agent, int hcode) Then activate it with it's method - Solver.run(). For example: Agent agent = new Agent(); // the representation of the problem. MailBox mbox = new MailBox(); CFPA cfpa = new CFPA(mbox, agent, WITH_HEURISTICS); cfpa.run(); Output is concerned with communication and statistics groups: When (if) a solution was found, each agent will send his solution and statistics to the central agent, the central agent will then prepare to strings, and then them to the communication module using the MailBox method sendStat(). The first string will contain the name of the algorithm followed by 5 numbers of statistic, denoting: a. Number of maximum sequentially constrains checks. b. Number of actually sequentially constrains checks. c. Number of maximum parallel constrains checks. d. Number of actually parallel constrains checks. e. Number of messages sent. The difference between maximum and actual is due to the fact that each agent is finding solution instead of just waiting, so actual checks are the one that happened until the chosen solution was found, maximum is all the constrains checks that were made. Sequential checks are sums of all checks made by all agents, parallel are those who were made by the agent who made the most check of all. Example of the first string: "CFPA 89 75 61 61 10" The second string is a list of pairs each variable and its assignment, by the solution. For example: "CFPA 1 0 2 1 3 0 4 1 5 1 6 2 7 1 8 1 9 0 10 0 11 1 12 2 13 1 14 0" Appendix A. FC-CBJ We used the FC-CBJ algorithm extensively, since each agent is solving its problem with this enhanced algorithm. Even though this algorithm is not the main target of the project, we don't want to regard it. Brute force algorithms can be suggested, such as repeating assignments to the whole network or simple tree searching, but this is of course not sufficient. A few improved algorithms were suggested, since these problems can be quite big and even for a medium size problem can be vary difficult. These algorithms are faster since they travel the tree less times. Well, this is with a memory cost of managing special data structures, but this cost is polynomial to the size of the problem, while the time complexity of the simple algorithms is worse. FC-CBJ, is a hybrid of two algorithms Forward Checking and Conflict-directed BackJumping. We will describe FC-CBJ through some basic algorithms, which it is based on. The description is based on an article written by Patrick Prosser. The basic algorithm which fc-cbj is based on is backtracking (BT). This algorithm is making a simple tree search, starting from the first variable, assigning it a value. Then it continues to the next variables and in each stage it tries to assign a value that is consistent to all the previous variables. In a case of success it proceed forward, in a case of failure it backtracks one stage, eliminates the value that was assigned to the variable from its domain and restore all domains of future variables. The hidden assumption is that this deleted value is responsible for the inconsistency, so we must find a new assignment for this variable. The problem is of course that the inconsistency may have been caused by a prior variable, so we might have been in the right way, and we will later come back, thus making useless steps. (This situation is called trashing). We measure the efficiency of an algorithm by the number of consistency checks that are made, so our main purpose is to reduce this number. A solution to this problem may be by jumping back, when an inconsistency happened, to the variable who caused it, this is exactly what the algorithm BJ (Backjumping) is about. This algorithm, when performing consistency checks with previous variables records which is the deepest variable that it was checked against and caused inconsistency, and then if necessary jumps back to it. A more enhanced algorithm is CBJ (Conflict-Directed Backjumping) in which after jumping back we check if this variable can still be assigned to some value, if not we then perform another backjump to prior variable that is in conflict with both variables that we already checked and so on. The knowledge for this backjump is preserved in a list for each variable, which maintains the variables, which were checked and caused conflicts. Up until now we improved in each algorithm the back movements of the algorithms, now it's the time to perfect the forward direction. Forward move can be more intelligent if we can use some knowledge when trying to initiate a new variable. Algorithm FC (Forward Checking) works as follow, when ever a new variable is initialized with some variable, we check all future variables to see if this initialization restricts some values from the domains of these variables, if so the algorithm removes those values from the future domain. When doing such an action, each variable that its domain was updated record in a conflict list, the variable that caused this change. This is for restoring the domain situation prior to this elimination, in case of backtracking. If during this process some domain gets empty, we know we can't give this value, to the current variable, so a new value is suggested, not before restoring all domains as they were before the forward checking. FC is prone to the same problems as BT, when doing just a simple(chronological) backtracking to a variable that has no connection to the conflicts that were found. So it is desirable to combine it with the smart CBJ algorithm, thus getting the hybrid FC-CBJ. FC is in charge of the forward moves, while the part taken from CBJ is making the back jumping as described before. There are other hybrids combined from the above algorithms and other, but according to tests that were done it seems that FC-CBJ is for most problems the best choice in the field of tree search algorithms. Appendix B. running the project Our part of the project is as a matter of fact a function, which the human interface (HMI or GUI) will use in order to solve explicit or random problems. At the end state of the project this will be activated through an HTML page on the WWW. At the current state it can be run manually in the following way: As an application: On the cs-bgu department machine silver, enter the right directory (at the moment this is: ~yagel/test) and type: java ComSrv 'port' 'file' cfpa 'port' is a port number and 'file' is a file that encodes a problem. This will run the server who prepare the problem for solving and waits until an appropriate number of agents will connect (depends on the problem). then open more machines, that can run java (like: silver, prune or lead) and type: java ComCln 'port' 'server' 'remote port' 'port' is a port number, 'server' is the name of the machine and 'remote port' is the port number given to the server on silver. Repeat last steps until enough agents will connect, and then the solution process starts. The process running the central problem will give the statistical output, this process will be the last one to be connected. Another new way is as an applet, by running the server from ~DCS/.html by typing: java mainApplet Clients will connect the same way. Appendix C. Documentation Selected html pages are added, for full description of the CFPA package, see the URL: http://www.cs.bgu.ac.il/~yagel/packages.html 1