Parallel Search.

Intro.

Parallel Search which is popularly known as SMP (Symmetric Multi Processing) Search, where the search speed improves with the additional number of processors.

The below graph clearly gives an idea about how the parallel speedup is achieved.

Screen Shot 2017-04-17 at 10.07.28 AM

There are several works related to this topic, but the most popular one being Lazy SMP which mainly deals with parallelising the chess engine. (GitHub Link)

The following section about Lazy SMP is mainly referred from this document.

Lazy SMP.

  • Lazy SMP is a newly discovered parallel algorithm where threads have minimal to no communication between them except a shared hash table.
  • It was experimented with implementing parallelisation in the engine, and it gave results with about 1.6 times speedup with 4 threads on 4 cores.
  • The idea is quite simple, with use of a shared hash-table, let all threads search the full chess tree and race to the finish line. The hash table makes sure that positions fully explored previously by any threads do not have to be re-explored.
  • The algorithm is very different from the other search algorithms, by using a lockless hash-table it avoids having any communication overhead or synchronisation overhead. It trades this by having a larger search overhead.

The lazy SMP test.

The goal of this test is to improve the chess engine on multicore CPUs using the lazy SMP algorithm.

Screen Shot 2017-04-17 at 10.20.20 AM.png

The main result taken from the lazy SMP test is the time to depth. This shows the actual speedup gained from the threads searching to depth 9. With 2 threads there was a speedup of 1.14, and 4 threads doubled the speed of using 1 thread, with a speedup of 2.01.

Implementation.

The implementation of Lazy SMP can be found here. The code was forked from another source. This is just a part of that code for experimentation purpose. I made changes as in changing the number of nodes. I also deliberately removed some search params to observe the behaviour of the code. The results were vague, I couldn’t present complete explanation in this section. I believe there’s a lot can be done when more of this is explored.


void getBestMoveAtDepth(Board *b, MoveList *legalMoves, int depth, int alpha,
int beta, int *bestMoveIndex, int *bestScore, unsigned int startMove,
int threadID, SearchPV *pvLine) {
// SearchParameters *searchParams = &(searchParamsArray[threadID]);
SearchStatistics *searchStats = &(searchStatsArray[threadID]);
SearchPV line;
int color = b->getPlayerToMove();
int tempMove = -1;
int score = -MATE_SCORE;
*bestScore = -INFTY;
SearchStackInfo *ssi = &(ssInfo[threadID][1]);

// Push current position to two fold stack
twoFoldPositions[threadID].push(b->getZobristKey());

// Helps prevent stalls when using more threads than physical cores,
// presumably by preventing this function from completing before the thread
// is able to detach.
std::this_thread::yield();

for (unsigned int i = startMove; i < legalMoves->size(); i++) {
// Output current move info to the GUI. Only do so if 5 seconds of
// search have elapsed to avoid clutter
uint64_t timeSoFar = getTimeElapsed(searchParamsArray[0].startTime);
uint64_t nps = 1000 * getNodes() / timeSoFar;
if (threadID == 0 && timeSoFar > 5 * ONE_SECOND)
cout << "info depth " << depth << " currmove " << moveToString(legalMoves->get(i))
<< " currmovenumber " << i+1 << " nodes " << getNodes() << " nps " << nps << endl; Board copy = b->staticCopy();
copy.doMove(legalMoves->get(i), color);
searchStats->nodes++;

int startSq = getStartSq(legalMoves->get(i));
int endSq = getEndSq(legalMoves->get(i));
int pieceID = b->getPieceOnSquare(color, startSq);
ssi->counterMoveHistory = searchParamsArray[threadID].counterMoveHistory[pieceID][endSq];

if (i != 0) {
score = -PVS(copy, depth-1, -alpha-1, -alpha, threadID, true, ssi, &line);
if (alpha < score && score < beta) { line.pvLength = 0; score = -PVS(copy, depth-1, -beta, -alpha, threadID, false, ssi, &line); } } else { score = -PVS(copy, depth-1, -beta, -alpha, threadID, false, ssi, &line); } // Stop condition. If stopping, return search results from incomplete // search, if any. if (isStop || stopSignal) break; if (score > *bestScore) {
*bestScore = score;
if (score > alpha) {
alpha = score;
tempMove = (int) i;
changePV(legalMoves->get(i), pvLine, &line);
}
// To get a PV if failing low
else if (i == 0)
changePV(legalMoves->get(i), pvLine, &line);
}

if (score >= beta)
break;
}

twoFoldPositions[threadID].pop();

*bestMoveIndex = tempMove;

// This thread is finished running.
threadsRunningMutex.lock();
threadsRunning--;
threadsRunningMutex.unlock();
}

Though I wasn’t able to comprehend the code very clearly, it is certain that this part of the code is responsible for implementing the Lazy SMP algorithm.

With all the relevant input to the function, it does the parallel search from the start to the end of the moves and based on the score param it does a recursive depth wise search. The computation is carefully done without conflicting the threads.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s