Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 357697

Summary: Remote indexing via RDT is about 3X slower than indexing locally-located code
Product: [Tools] PTP Reporter: Corey Ashford <cjashfor>
Component: Remote ToolsAssignee: Project Inbox <ptp-inbox>
Status: RESOLVED WORKSFORME QA Contact:
Severity: normal    
Priority: P3 CC: mikekucera, recoskie, rsjoao, wainersm
Version: 5.0.2   
Target Milestone: ---   
Hardware: PC   
OS: Linux-GTK   
Whiteboard:

Description Corey Ashford CLA 2011-09-14 17:47:52 EDT
Build Identifier:  I20110613-1736

We had noticed that indexing remotely located code takes noticeably longer to index than indexing the same code located locally.  I don't know why this is exactly.

For a relatively small project that takes 5-7 seconds to index locally, it takes about 22 seconds when indexed using RDT, and that's when code is actually on the same machine with no network latency.

As a more real world example, I indexed the Linux kernel source, which contains about 18,000 source code files.  (To reduce the indexing time, and to get a more usable index, I filtered out all of the arch/* directories from the indexing operation, except for arch/powerpc, bringing the total number of source files down to 18,000)

When indexing the same code locally, it takes about 20 minutes.

I don't know if this is really a bug, or there's something that's inherently slower about remote indexing that can't be fixed.

Reproducible: Always

Steps to Reproduce:
1. Obtain the source code for moderate-sized, open source project and place it on a remote machine (or even the client)  
2. Create a Remote C/C++ project for that code using Remote Tools.
3. Run the indexer operation and measure how long it takes.
4. Repeat the same thing, but this time use a local C/C++ project and measure the time.
Comment 1 Chris Recoskie CLA 2011-09-15 10:38:34 EDT
In general, remote indexing is typically on par with local indexing, performance wise.  Depending on the speed of the machines involved, it's often faster in the remote case, as usually your remote server has more horsepower than your desktop.

I'm not sure what's going on in your setup.  Now that our product work is winding down one of us will have more time to get together with you and do some work to debug this.
Comment 2 Renato Stoffalette Joao CLA 2011-09-15 11:18:06 EDT
I tried to create a simple helloWorld project on a remote server and could also verify a longer amount of time during indexing compared to indexing a local one.


Steps to Reproduce:

1. Create an empty  Remote C/C++ project using Remote Tools.
2. Create a new empty main.c file and type in 'hello world' source code.
3. Build the project. 
4. Measure time for indexing.



Also noticed the longer time it took to re-open a project 

Steps to Reproduce:

1. Close the current remote project
2. Re-open the remote project
3. Measure the time to index the project.

*Note
I have a ping response time=59.6 ms to the remote server and 0% packet loss
Comment 3 Chris Recoskie CLA 2011-09-15 11:31:52 EDT
Creating a project or opening a closed one will take longer, there is no way around that.  Those do remote file access, and that's going to be slower on a remote system than your local one.

Remote indexing should have a small delta slowdown, but shouldn't be very much, all things being equal.  Remote indexing sends a request to the server, which does the indexing;  there is no remote file access in this case.  All that is sent pack is progress reporting updates, which are fairly small.  If there are unresolved inclusions those errors are sent back at the end of the operation.

The first time you do an index-related operation during a given connection session, it will send a list of all the files in your project to the server.  That might take some time if you project is really huge.

I assume you're using a regular remote project and not a synchronized one.  That one tries to index locally with remote headers... probably going to be very slow.
Comment 4 Chris Recoskie CLA 2011-09-15 11:39:04 EDT
Another thing to keep in mind is where are your home directories?  By default, the indexer places the PDOM files in ~/.eclipse.  If your home directory is an NFS share, that can slow things down.  You really want the indexing directory (configurable in the service configuration) to be a directory local to the server.
Comment 5 Corey Ashford CLA 2011-09-15 12:33:33 EDT
To be clear, I am using the standard Remote C/C++ project, not a synchronized project.

My workspace and home directory are on my laptop's hard drive.  There are no NFS-mounted directories on either the remote host machine nor on the client machine.
Comment 6 Mike Kucera CLA 2011-09-15 15:16:55 EDT
(In reply to comment #3)

> Remote indexing should have a small delta slowdown, but shouldn't be very much,
> all things being equal.  Remote indexing sends a request to the server, which
> does the indexing;  there is no remote file access in this case.  All that is
> sent pack is progress reporting updates, which are fairly small.  If there are
> unresolved inclusions those errors are sent back at the end of the operation.

If its the same indexer task just running on the server then yes theoretically it should be just as fast. Especially when its on the same physical machine. 

I have a gut feeling that this has something to do with the progress reporting updates, they might be blocking the remote indexer thread. Looks like RemoteIndexProgressMonitor calls DataStore.refresh(). What does refresh() actually do? The javadocs don't say much. If it blocks until the client and server datastores are in sync then that might be the cause of the problem.
Comment 7 Chris Recoskie CLA 2011-09-15 16:01:35 EDT
(In reply to comment #6)
> I have a gut feeling that this has something to do with the progress reporting
> updates, they might be blocking the remote indexer thread. Looks like
> RemoteIndexProgressMonitor calls DataStore.refresh(). What does refresh()
> actually do? The javadocs don't say much. If it blocks until the client and
> server datastores are in sync then that might be the cause of the problem.

DataStore.refresh() is supposed to sync them, as far as I know.

I'm skeptical that's what's wrong though or we'd be seeing this all the time.  Corey sees the same behaviour on both local (connected via localhost) and when connecting remotely.
Comment 8 Corey Ashford CLA 2011-09-15 21:16:25 EDT
After talking with Chris briefly, he said that my CDT may be too far out of date and that some fixes were made to the scanner for the GNU toolchain.

I was using CDT 8.0.0 - the official release, and the latest PTP build that's available via the repos, which is from around Sep 1st.

I found that I can install the latest nightly build from hudson here:
https://hudson.eclipse.org/hudson/job/cdt-nightly/?

And so I installed it.  With this CDT, the indexing problems seem all to have been fixed.  The indexer is now successfully finding the include files, the photran indexer flashes up in the progress view a couple of times, but then completes, and best of all, the indexing time has dropped dramatically.  It takes only about 10 minutes to complete a remote indexing operation.

I also surfed around the source code, and it does appear that the indexing is actually working, because I can look at call stacks, navigate to definitions of types, etc.

So perhaps we can call this "fixed in SR1" ?
Comment 9 Chris Recoskie CLA 2011-09-16 10:49:46 EDT
(In reply to comment #8)
> So perhaps we can call this "fixed in SR1" ?

I think it's more the case that your CDT and PTP were mismatched, so I'm not sure saying we fixed anything is accurate.  In any case, I'm going to close this.  Thanks for being patient while we figured it out.
Comment 10 Chris Recoskie CLA 2011-09-16 10:51:33 EDT
*** Bug 356652 has been marked as a duplicate of this bug. ***