Hi,
If anybody has experience of data scraping from online databases I would appreciate some assistance if possible.
I have a custom application that uses a command line interface via the XP or Vista command prompt box to extract info from an online database. The custom application queries and navigates multiple web pages and scrapes the required data into a text file (approx 8mb) on my local PC.
Previously I was carrying out this task with a Sony Vaio laptop (Vista) with 2.0ghz core 2 processor and it all went OK. I have since changed laptops to a Dell Latitude (Vista) 2.2ghz core 2 processor and the process has become much slower.
All other variables are the same, I am using the same BT Home Hub 2, the same hard wired LAN connection, same ISP, ie, everything external to the PC is the same. The custom application is the same, ie, it has not been modified as it is wrapped up in an exe file.
Are there some settings re BIOS, CPU, Caching, Network Card, Command Prompt Window etc that would have an impact on the speed of data scraping ? It might be my imagination but the PC hard disk seems to be working a lot more re the HDD activity LED on the slower PC.
In addition, I have tried the process on some XP machines and they all run slower that the Sony VAIO. There is something about the Sony PC that rapidly speeds up the process by a factor of approximately 8 fold when compared to any other PC I have used. To give a guide, the Sony used to extract the data in 8 hours, other PC's vary between 30 hours and 40 hours each time the process is run
The process is making approximately 14,000 web page calls each month I run it hence the scope to slow down if there are some limiting factors.
Thx & Rgds