Common Questions

What type of files do you need?

We prefer PDF or HTML files, unless your website includes a programmatic way to access the XML view of your articles.

How do we know that our PDF files are secure with you?

We do not store any files on a web-accessible server. Within our research group, only system administrators and two group members have access to the original files.

If this is not sufficient for you, we can convert the PDF files to text and then delete the original files. Alternatively, you can also send us raw text or XML files. You can even garble the text, if you prefer, e.g. randomize the sentence order. We can send you programs that do both tasks that run on any standard computer.

What is a genome browser? Is this search engine useful?

We think it is useful, but we are probably biased here. You might want to ask an editor of a medical/genomics-related journal within your organisation to have a look at the current genome browser track.

Why would we let you access our fulltext?

We are providing a free service for your genomics-oriented readers. It drives traffic to your site and increases visibility to researchers. This can increase impact and potentially leads to article purchases.

Do we lose money by letting you index our articles?

Google Scholar is already showing fulltext snippets. We think that 150 characters from a non-trivial research article are not enough to get important information about the whole article. If the article is relevant to the readers, they will need the full article, not just 150 characters. Note that we can adapt this to your needs (see below).

Snippets reveal too much of the article.

Like Google Scholar, we can cut them to any length you like or suppress them completely. They are only shown to indicate if the match is relevant for users.

There are too many snippets per article

We will soon limit them to a maximum of 20 snippets per article. This is a known issue of the current version and will get changed with the next version.

How often do you update your index?

We crawl new updates via PubMed. Depending on the amount of data that we get, we might opt for weekly or monthly updates.

What is your IP address and what is your user-agent

Our crawler runs on the IP 128.114.50.189. It identifies itself as "genomeBot/0.1 to your webserver.

Can we integrate your results into our website?

Of course. We are happy to provide our results for integration into your system. You can then highlight/markup the tags in the original paper on your website. You can either import the data for each request via our RESTful webservice in JSON format or import our tables via FTP or HTTP into your system every night.

Who is funding this project?

This project was funded originally by the EU and BBSRC (UK) and is funded now by the European Molecular Biology Organisation (EMBO). The Genome Browser is funded by the US National Human Genome Research Initiative.