Not sure if there is any good answer for this but thought I’d ask here.
I search eBay a lot for good deals. I look at newly listed items under a variety of search terms. I’ve run into new listings that have multiple views to them even though they were literally just listed.
Now from what I can tell eBay will run all listings through some sequence of checks before they go live. But it doesn’t seem to be an equal playing ground.
I’m sure with multiple servers across the world there are some delays that happen when listings are posted. So some locations(?) may have a slight advantage over others with auctions populating.
Reason I ask: Recently I purchased an item from a seller that I directly worked a deal out with. They sent me the auction ID number as soon as it was posted. Their selling feed didn’t show the item (CTRL+F5). Search didn’t display the item using the item# to search. Advanced search didn’t find it. I had to modify a url and paste the auction ID# into the URL to find it. The listing was under 2 minutes old. Now generally this is fine, but I noticed that 5 people had viewed the item once I did find it. So somehow, the listing was live somewhere, just not for me.
Anyone have any insight into how eBay listings are populated and/or ways to get ahead of the curve somehow?
The way search engines work in general is though the use of a document storage system and an inverted index. Documents are searchable entities, so think webpages for Google, products for Amazon and individual messages on Discord. Ebay treats each posted item as a document and stores them all on a giant distributed (multi-machine) database (HBase is the database they use, for those interested).
To be able to quickly search a document database an index is necessary. Without an index, the database would have to check every single document to find matches to a search keyword which is not feasible. A search engine relies on something known as an inverted index which operates exactly like the index in a book. Basically, the inverted index is a list of words and each word has a list of documents that contain that word. In the context of Ebay, looking up the word “pokemon” in the index will give you a list of every item with “pokemon” in the title (again, similar to the index in a book).
When you perform a search on Ebay, multiple computers will work together to scan though the index, rank each item and the results are aggregated and returned to you. So after all that background, we get to the main issue. If the item has not been indexed yet, it is not going to appear in search results. Typing in the direct URL to the item will work because you’re querying the database directly and not searching indirectly though the index.
The harder question to answer is why there were already views on a seemingly unsearchable item. The simplest explanation is that the views were from the seller. A more complex answer involves a better understanding of the indexing process. Apparently, every 8 hours Ebay reindexes every single item on the website. This involves going though every item, parsing out each word from the title and description and building an index from scratch. Between these large tasks, Ebay will send updates to the index about items that have been added, removed or modified every few minutes. This is when new items become seachable.
The following is a bit more speculative but I wouldn’t be surprised if this is how it works. The index is going to be very large. Ebay probably distributes the index across multiple machines for the sake of scalability. When you distribute information like this, you need to replicate the data a few times across machines to ensure data is available if a single machine in the cluster goes down or is under heavy load.
This set-up means there will be times when the replicated data is out of sync. It may be possible that the index node that your query used was not as fresh as the node others used. This means others will have more up-to-date search results than you do, they will see the item in your results and you won’t. This might be based on physical location or just chance and there’s no way to really control or take advantage of this in a consistent way.
I didn’t intend on this being so long but it’s relevant to what I do at my job and was interesting to look into.
It’s live as soon as the seller hits OK to list. The system picks it up and those 5 views can be search alerts and third-party websites/services checking eBay for their customers. I’ve seen this many times when I list (especially) Charizards. That within seconds after it hits the listing I have 5 to 7 views.
The views could also have come from the person that listed the auction themselves. I know that I’ve had freshly listed auctions with 5+ views, all from myself because I had to repeatedly check if the auction is how I wanted it.
While the item may be live as soon as the seller hits OK it doesn’t appear in search immediately. And as I pointed out it doesn’t appear under the sellers items, search results or even a search for the item ID# (at least not instantly). This is easy to prove and I has 100% been my experience. The custom URL with the item ID seems to be the quickest way to see an item once it’s posted live (but that’s not really helpful in general searches).
There may be tricks, bots, crawlers or some other method for identifying new listings that I’m unaware of. This is more what I’m asking, if tools or tricks like this exist.
My initial assumption is eBay likely has multiple arrays of servers. Location could be a factor as an item is added to the database and replicated across their server network. So it could be possible that someone in Florida may see an item 30 seconds before a user in Japan depending how traffic is routed. Could also be why once an item sells it can still randomly appear in search results/ sellers other items for a few minutes after is is sold.
If it’s been indexed, it’s searchable if not then the only way to see it is having a direct URL. You’re not going to find an automated way to see unindexed items unless you can predict new item IDs. You need to know the URL in order to use the automated approach and you’re not going to see the URL until it’s been indexed and is searchable.
Maybe you could set up something that performs the same search across the world in the hopes of hitting the freshest possible index. But that only gives you something on the order of seconds or minutes to react before it becomes available to everyone. You’d also have to constantly be applying the same search from multiple locations every few seconds. Ebay would likely take notice and ban your IPs.
Makes total sense and basically what I was assuming is the case. If I’m reading your reply correctly you also believe that region may impact the “freshness” of the indexed search results?
When it comes to buying up fresh listings those seconds can be very valuable. In the end it wouldn’t matter to eBay, they get their fee regardless once the item sells and this only applies in the case of buy it now listings.
And I’m afraid spoofing my location across the world is a bit over my head (even just for testing purposes).
I have no definitive evidence but like I said in my first post, ebay probably has a large enough index to justify storing it on a distributed cluster. This necessitates replication like you said. It’s been proven that when you have this set up you either have to choose between having your service always available or always consistent for all users, you can’t have both (the so-called CAP theorem). Ebay will no doubt choose availability over consistency which means some searches may use a fresher version of the search index.
So yes, I’m saying that there’s a good chance different people may get different search results for a brief period of time.