Updates for Wikia Search unveiled

June 4, 2008

Well i havent updated this in a long time, so appologies for that, but here we go again hopefully for the future.

Search Wikia/Wikia Search has now been updated with lots more cool features, including:

  • The ability to edit any result, title and summary. The edits are then instantly available to everyone
  • The ability to add new results for any search query instantly
  • The ability to delete and/or hide any result
  • Every result item can be rated 1-5 stars, which will slowly influence the ranking position
  • The ability to add suggested and/or related searches for any query
  • The ability to add public comments to any result item
  • The opportunity to see site previews and annotate text, images, links and forms directly into the results
  • The ability to try any search on Google, Yahoo, or any other search engine with a single click
  • The ability to customize the background on the header for a more themed result for any search
  • The opportunity to view the change history showing all the social actions for any page

Jimmy also made a good video of the new features, which you can view below.

You can visit the new features at search.wikia.com


Search Wikia launches, screenshots

January 7, 2008

Screenshots of the the project are below. These will be followed later by descriptions and explanation and hopefully and interview with Jimmy.


Pre-alpha and launch date

December 24, 2007

So the private pre-alpha stage is now working, believe me!! This is for testing and the breaking of the system before the large scale opening of the system on January 7th according to messages from Jimmy.  I do want to write more but i cant (sorry) due to the terms of the invite and beta.

More soon 🙂

Search Wikia *will* launch before 2008!!

December 23, 2007

Jimmy was again cornered today in IRC and when asked about the Search Wikia project and a possbile launch date he responded with the following:

[15:14] <jwales> the search engine *will* launch before the end of the year, probably in private beta first, and then open to the public in early january. No specific dates are certain yet. But sooon.

well there we go then, maybe we will see a launch date in December, as previously mentioned , just not the whole thing.

First crawl!

December 18, 2007

So Jimmy announced today on the mailing list that “we are now doing some preliminary crawls in anticipation of the upcoming launch.” A wiki page was also created at http://search.wikia.com/wiki/Whitelist and everyone has been asked to contribute towards the list of pages to be crawled by the new servers.

Jimmy continued in the email saying, “Information is going to change quickly in the next 2-3 weeks, but I’m getting excited.”

1000 servers?!?!

December 17, 2007

The screen shot below shows a picture by Jimmy Wales uploaded to Flickr (here) with the caption “Hmmm, what could nearly 1000 servers be for?”. Also on the Search Wikia mailing list when asked “Any updates on the Wikia search project…?” Jimmy replied “On track, we are hard at work right now…” with a link to the image shown. Many people have speculated that this is for a crawl and/or Hadoop cluster and this was confirmed by Jimmy Wales on IRC when I asked if the servers were for a web crawl he replied “they are for the wikia search web crawl, yes”.

(Click to enlarge)

Loading the truck

“Socialpedia” screenshots revealed

November 18, 2007

Jimmy Wales

So Jimmy was in South Africa again this week talking to the local community and “geeks” about wiki stuff and he also released some screenshots of what is being described as “only part of the story”. Jeremie Miller also said “it is just a part of the overall
effort, there isn’t any one magical thing, just a lot of pieces moving in the same direction :)”

The screenshot below were taken by Nic Haralambous in South Africa and show social profile that has been claimed by many to look similar to that of a Facebook profile, however Jimmy told Wired’s Terrence Russell “That [the design] will probably change before launch”.

Click image to enlarge

Socialpedia Screenshot

He also told me that “There’s a ton of work to be done in the next month, but I’m really excited about how it’s all coming together, all the various moving parts and novel combinations, sure should be interesting!”.

So thats the latest on where the overall Search Project is going and hopefully we will here some more in the next few months…

Jimmy Wales picture by Imamon.

Short interview with Jeremie Miller

September 26, 2007

Right well this is a long piece this time but I’ve been trying to get it out for a long time but hey here it is. Firstly I posted a couple of questions to Jeremie Miller recently and the Qs and As are posted below. Also further below is a mailing list email from Jer which includes many more details about Grub and it’s progress.


Q: What do you see happening in Grub’s near future?
A: The ability to get immediate feedback, upload URLs and download crawl snapshots, etc more usability of the functions and a better protocol to make deving clients easier.

Q: Grub is currently being used to index the net but is anything else in the pipeline for the rest of the project (i.e. Atlas)?
A: Yeah, Grub is to be the best crawler and for a social good, anyone can benefit from it’s results. As for quality indexing, I hope there’s multiple projects we can start around that some of it being natural language, some of it being good ranking, some of it being scaling across many computers, etc but they all depend on a good source of data, that’s what Grub has to do really well first.

Q: What are you currently working on?
A: Figuring out how to get grub.org running in a more production mode, and exploring it’s source code. Also, some prototype source code for Atlas stuff, some testing tools, but that’s a few weeks away.

Grub Development post from Jer

A lot more info was released in a post to the Grub Development mailing list and is included below.

I’ve been meaning to send out the low-down on all the Grubbing going
on the past month or so, and some ideas for where it’s all going,
feel free to ask if I don’t answer anything anyone might want to know
here )

First, most everyone should have noticed the global stats are working
finally, and in having them up you can see the service still goes
down semi-frequently. We’ve got the entire thing “throttled down” as
far as it will go and it’s still crawling millions of urls daily and
filling up the 30GB partition it’s caged in for testing )

So, some things learned about the current Grub system:
* it’s not recursive (doesn’t automatically discover/inject new urls)
* it is capable of obeying robots when injected with them
* only grabs text/html right now
* uses a simple checksum to look for changes
* doesn’t track ETag or Last-Modified (pretty major flaws IMO)
* was over-engineered for modularity
* uses a rather obtuse SOAP encoding
* stores crawl results in it’s own also obtuse encoding

Hmm, that’s enough pain to start with… so the very first goal was to get the crawl output in a more usable format, the Internet Archive ARC format (http://www.archive.org/web/researcher/ArcFileFormat.php ). That happened this week, and now the work-unit binary blobs are being converted into much more useful ARC files automatically, yay!

The next step is to get a lot more URLs loaded, there’s about a
million total that exist right now, basically a random sample, and
we’re churning though those a few times a day. I have extracted over
16 million more urls from a wikipedia snapshot and before they get
loaded they have to go through a robots check/import, that’s the goal
for this week. Once there’s a solid base of URLs, the hope is to
then start extracting new/discovered ones from the resulting ARC
files on the output, keep building on itself.

Moving up to the big picture, the overall goal here is to focus Grub
on being a completely open both on the input and output, a shared
crawling resource for use by anyone. More specifically, to turn the
administration into an open wiki where anyone can suggest new URLs,
review existing URLs, create site policies, and view crawl stats and
samples for any set. On the output anyone will be able to grab the
latest cached copies of individual URLs, get entire snapshots/sets as
they happen, or even build custom jobs to filter through and grab
copies of just what they need. I’ll take some time to get all this
together of course )


Search Wikia Launch date??

September 2, 2007

According to this article from the Times we could be seeing a release date as soon as December this year.

Nice to know.

Post your opinions on this by adding a comment.


First ticket

August 30, 2007

Tracs got it’s first use.  Ticket number one is now dedicated to a compression error.

Tracs for reporting your errors with Grub.  If you want to start a new ticket for an error then follow this link.

More soon.