Tuesday, 17 May 2016

Publishing Big Data Science

This is the third and final post in a series in which I share some lessons learned about how to plan, manage, analyse and deliver a ‘big biodata’ project successfully.

Now that you have the results of your carefully planned, meticulously managed and diligently analysed experiment, it’s time to decide on what to publish, and where.

1. Present your work

I love presenting, because having to explain my work to a mixed audience helps me understand and articulate the science better, and to convey the excitement of discovery. What is the work for, it not the joy of exploration? Creating figures to use in a presentation is enjoyable, and helps me get my thoughts in order.

I find writing paper less enjoyable than presentations, but the same core is present in both – good figures which provide a strong narrative from design through to analysis. There is however a particular rigour in writing a paper that brings out the best in a piece of scientific work. Present, and publish – it’s important to us all.

2. Organise your material

Most of these papers comprise both a main paper and a supplement. The main paper will feature the figures that tell the story: experimental design, discovery, main findings, interesting cases. It should be written for the interested reader who will mainly trust you on the experimental and analysis details.The supplement is for the reader (including a reviewer or two) who does not trust you. Sometimes, on other people's papers, you will be that reader. The supplement should have the same flow, but have all the supporting details that tell that reader the data and analysis are kosher.

3. Figures first

Make good figures that illustrate your point, and test them out in presentations, first to the group, then to colleagues in your institute, and then more widely. You’ll fine-tune the figures as you go. Your presentation will need quite a bit scaffolding (why the question is interesting, about your experimental design, key statistics), but don’t be afraid to show sample data from your results to show your motivation. Consider showing a boring and interesting case side by side. You may find this scaffolding can be condensed into your Figure 1 for the paper. You can show other figures in the supplement if they support your work.

4. Put pen to paper

Once your figures flow, you can write the results. You can also start working on the supplement, following the same general flow. All the ‘data is good’ plots will go in the supplement, as it can have extra “lemmas” about the data. Don’t skimp in the supplement – include technical details supporting things like, why your normalisation is sensible, or better than other approaches. If the supplement gets big, provide an index on the supplement for navigation. The sceptical reader will like to see this.

5. Focus on the results

Write the introduction and discussion after you are happy with your results write-up. Think about the readers and the reviewers, and make sure to cite widely. If you are coming into a new arena with this high-throughput approach, lavish praise on the importance of the field and the massive amount of individual loci work on which you are building. Basically, if you are publishing a large-scale approach in an area that hasn’t had one, avoid being seen as an interloper; read the papers, cite them – and you are likely to find a couple of new angles on your work through this process.

6. Length angst

If you are aiming for a journal with strict length limits (and I do wonder why we tolerate this in this day and age), don’t let that hold you back at the submission phase. Write as much as you need to, and acknowledge the length in your cover letter. Emphasise that you want the reviewers to have a full understanding of the science. For these more restricted space papers, reviewing at that density is often really hard – the text can be edited after review.

7. Be open

It is pretty standard that you will be publishing eventually open access (certainly if you are NIH, or Wellcome Trust and other funders). It is easier to do this via journals which automatically handle the open access submission (Plos, Genome Biology, BMC series and many others, sometime with open access fees). Due to the funder mandates pretty much every journal will at least allow submission of your author manuscript to PubMedCentral, but doing it yourself is quite annoying. 

There are new experiments in open publishing as well to look at. Two examples are F1000 and Bioarxiv. In F1000 the whole process of submission, peer review and publication is done in the open - it interesting to watch open peer review in action. Bioarxiv is following the more physics pre-print server, and many journals allow pre-print posting whilst a paper is under review. This is a cool way to stop being scooped and provides a way to get community input ("informal peer review"). I think we're in an experimentation phase of this next stage in open science, and it's going to be interesting to see where we end up.

8. Tidy up and submit your data

Make sure you have all the raw data to submit, with the meta-data nicely tidied up (ideally, your LIMS system will have this ready to go by default). Submit your structured data (DNA, Proteomics, Metabolomics, X-ray structure, EM) to the appropriate archive (EMBL-EBI has the full range). Have a directory that you keep in house; otherwise, put all the intermediate datasets and files on the web. This is good for transparency – the sceptical reader will be even more reassured when he or she knows that they can (if they want) not only get the raw data (a given for molecular biology) but can also come into the analysis half-way through. About half of these readers could be future members of a group you may ask to "follow the analysis in paper A", or to confirm that "XXX did this in paper B". Do this for your own group's sanity and for extra brownie points from readers around the world.


Maani kamili said...

It’s really amazing that we can record what our visitors do on our site. Thanks for sharing this awesome guide. I’m happy that I came across with your site this article is on point,thanks again and have a great day. Keep update more information..

Digital marketing company in Chennai

Markus Miller said...

I am studying social science and plan to use data research techniques to prove some of my hypothesis on the nature of human cooperation in different environments.
I heard that data science has an awesome application in biology and genetics. Seems that is true enough. Have one good assistance tool for data research, look for link text here. Maybe you will find it useful.

Heni Herbal said...

Happy to be able to visit your website thanks
Obat Kolesistitis Tradisional
Pengobatan Herbal Untuk Menyembuhkan Stenosis Spinal
Pengobatan Tradisional Iridosiklitis Akut Yang Aman Serta Efektif
Cara Mengobati Rasa Perih Di Lambung Secara Alami
Cara Mengobati Teratozoospermia Secara Alami
Ketahui Gejala Tumor Otak Dan Cara Mengatasinya
Obat Anemia Alami Yang Bagus
Obat Alami Untuk Oligoasthenoteratozoospe
Cara Menghilangkan Selaput Putih Di Mata Secara Alami
Cara Mengatasi Pengapuran Tulang Dan Sendi
Ahlinya Obat Migren

Thiên Lôi said...

Đơn vị chuyên đặt mua hàng Quảng Châu online qua mạng
Đơn vị chuyên nhận đặt mua order đồ taobao cấp tốc tốt nhất hiện nay
Đơn vị chuyên nhận ship đồ taobao nhanh uy tín nhất hiện nay
Công ty chuyên nhận đặt mua order hàng taobao

Heni Herbal said...

thanks so much for the information that you provide on this holiday.
Inilah Gejala Penyakit Gagal Ginjal Pada Anak
Penyakit Gagal Ginjal Dan Pengobatannya
Penyebab Leukosit Tinggi
Obat Tradisional Triflex Capsule Green World
Menu Makanan Sehat Untuk Penderita Leukimia

Heni Herbal said...

Zest for lunch today? Thank you for the information
Obat Alami Untuk Meningkatkan Daya Ingat
Pengobatan Alternatif Sakit Maag Secara Tradisional
Pengobatan Alternatif Nyeri Sendi Bahu Tradisional
Pengobatan Alternatif Nyeri Sendi Bahu Tradisional
Obat Tradisional Darah Tinggi

Daisy Dreamz said...

To practice Data Science, in the true meaning of the term, you need the analytical know-how of math and statistics, the coding skills necessary to work with data, and an area of subject-matter expertise. Without subject-matter expertise, you might as well call yourself a mathematician or a statistician. Similarly, a software programmer without subject-matter expertise and analytical know-how might better be considered a software engineer or developer, but not a data scientist.

Heni Herbal said...

present information this morning really means a lot to us thanks .
Pengobatan Tradisional Radang Lambung
Obat Kista Duktus Tiroglosus
Ginkgo Biloba Plus Capsule Green World
Pengobatan Alami Hipertensi Paling Mujarab
Cara Menghilangkan Lendir Di Paru-Paru Secara Alami

rsrs said...

Concepts are very clear.tq for sharing such a valuable content.we are very happy to recieve such a nice one tq

Data science training
Data science online training
Data science training in hyderabad

Databi Vishnu said...

Great infomation thank you for sharing knowledge with us. We provide Data Science Online Training


thank you for offering such a nice content.one of the recommanded blog.

Data science training in hyderabad
Data science training in ameerpet
Data best data science training institute