To check the results obtained we provide a scri'pt to generate a .gexf file to be opened with Gephi.
Test datasets are available at the Toronto university website.
Technologies used:
- PHP
- Gephi
- XML
This method takes longer than the other one, however the result is equally right. Using a sparse matrix instead of the full one may help speed up the process.
Method description:
- Pick the NxN input matrix and make it stochastic, call it A
- Create the P" matrix as alpha*A+(1-alpha)*M with alpha a factor which describes the probability of a random jump and M an NxN matrix filled with values 1/N
- Create starting vector v_0 with length N filled with values 1/N
- Compute v_m = v_m-1*P" where the first time v_m-1 is exactly v_0
- Compute the difference v_m - v_m-1 which should converge to 0
- When the difference doesn't vary over a certain threshold or i iterations have been made, stop. v_m should contain the PageRank for every page
NOTE: Due to Apache and PHP limitations, you may have to modify the php.ini file to grant more memory to the scripts (if you don't want to store the matrix on filesystem like we did) by setting the memory_limit parameter to at least 768MB and raising the maximum script execution time to 5 minutes(300 seconds) with max_execution_time.
2. Iterative method - found on phpir
To use this method you must have the nodes and adj_list files from the chosen dataset.
Method description:
Each page is given a starting PR of 1/N where N is the number of nodes in the graph. Each page then gives to every page it links a PR of current_page_PR/number_of_outbound_links.
It's introduced a dampening factor alpha (0,15) which represents the probability of making a random jump while visiting the graph or when reaching a cul de sac.
PR_new = alpha/n + (1-alpha)*PR_old
Every PR is then normalized between 0 and 1.
The process keeps going until it has reached i iterations or the difference between old and new PR doesn't vary over a certain threshold.
Our implementation is available here.
3. Gephi parser:
To use the parser you you must have the nodes and adj_list files from the chosen dataset. The parser outputs a .gexf file to be opened with Gephi. Here is a sample .gexf file obtained by parsing the first dataset available on the site (the one about abortion).
No comments:
Post a Comment
With great power comes great responsibility