Skip to main content

FAQs

1. What is VinGen Data Portal?

The VinGen Data Portal provides partners with web-based and command-line based access to data of various genomics studies on Vietnamese population carried out at VinBigData. Please visit About VinGen Data Portal for additional information.

2. What is the main goal of VinGen Data Portal?

The main goal of VinGen Data Portal is to develop a system for management, analysis, and sharing of large datasets to serve bioinformatic projects which hold a large amount of complex and diverse data implemented at VinBigData.

3. How can I collaborate with VinGen Data Portal?

VinGen Data Portal welcomes collaborations with organizations conducting research in or providing informatics supporting genomics studies. Organizations interested in collaborating with VinGen Data Portal should contact vingen@vinbigdata.org.

4. Are there restrictions on the use VinGen Data Portal data in publications?

All VinGen Data Portal data can be used in publications or presentations. For additional questions about the use of VinGen Data Portal data, or to explore opportunities for collaboration, please contact vingen@vinbigdata.org.

5. How do I cite the VinGen Data Portal?

Please cite the our platform paper (to appear) in papers that make use of VinGen Data Portal data, and provide a link to the browser if you build online resources that include the data set.

There is no need to include us as authors on your manuscript, unless we contributed specific advice or analysis for your work.

6. How do I query and download data from VinGen Data Portal?

Partners can query data from VinGen Data Portal portal by clicking on the Work drop-down menu and selecting the Query menu. Currently, VinGen Data Portal is not providing automatic download of our data. If partners wish to obtain any specific data, please contact vingen@vinbigdata.org.

7. How do I submit data into VinGen Data Portal?

Partners can submit data into VinGen Data Portal by clicking on the Work drop-down menu on the top right corner, thereafter selecting Submit menu.

8. What data types and data formats does VinGen Data Portal support?

Please refer to our user guide on Data files and formats for a list of standard data types supported by VinGen Data Portal.

9. What reference genome is VinGen Data Portal using for analysis?

VinGen Data Portal is using hg38 with population-diversity alternate contig.

10. How do I obtain an account to log in to VinGen Data Portal?

VinGen Data Portal is currently providing log in via Google accounts.

11. Where do I go to report an issue or submit any inquiry about VinGen Data Portal?

Partners could report an issue or submit any inquiry about VinGen Data Portal by contacting vingen@vinbigdata.org.

12. How do I create an advanced search query?

Partners can perform advanced search query using VinGen Data Portal search interface. Detailed instruction on using VinGen Data Portal search interface are available in the portal's user guide.

13. When is VinGen Data Portal maintenance performed?

The MASH maintenance window is conducted monthly occurring on the last Saturday of the month, from 10:00 am to 4:00 pm GMT

14. What is the recommended tool or protocol for transferring large volumes of data to or from VinGen Data Portal?

This feature is currently under development.

15. When using VinGen Data Portal Data Transfer Tool, is it possible to set a bandwidth limit?

This feature is currently under development.

16. Does VinGen Data Portal Data Transfer Tool use random or sequential read/ write? Does the choice of protocol make a difference?

This feature is currently under development.

17. How is validation performed on genomic data (BAM files) submitted to VinGen Data Portal?

Different Quality Control steps are applied along with the analysis pipelines of VinGen Data Portal. For the assessment of mapping quality, we collect different metrics from the bam files using standard tools such as Picard and qualimap for further process.

18. Where can I find information about VinGen Data Portal data model?

Partners can find details of VinGen Data Portal data model in the user guide.

19. How do I search for a particular variant?

To search for a variant, you can utilize the Quick Search bar at the top right portion of VinGen Data Portal Portal by entering in either a dbSNP reference cluster ID (rs#) or the coordinates of the chromosomal change. For example entering in 'rs121912651' or 'chr17:g.7674221G>A' will bring the user to the variant entity page for that variant.

20. What web browsers are supported by VinGen Data Portal?

The following web browsers are supported for use with the GDC Data Portal, Submission Portal, Legacy Archive, Website, and Documentation site.

  • Most recent supported stable version of Microsoft Edge
  • Most recent stable version of Google Chrome
  • Most recent stable version of Mozilla Firefox

21. How do I obtain access to a specific controlled dataset?

Partners can contact vingen@vinbigdata.org to request particular controlled datasets.

22. How do I avoid timeouts and transfer interruptions when downloading large datasets from the VinGen Data Portal?

The VinGen Data Portal is a web-based application that is limited by the browser and network constraints. If a system timeout occurs when downloading files or uploading files, please use the Gen3 Client tool or contact vingen@vinbigdata.org.

23. How can I access VinGen Data Portal sequencing data in FASTQ format?

Partners can contact vingen@vinbigdata.org to request sequencing data.

24. Are all the genotype calls in the VN1000G Project current release VCF files bi-allelic?

Yes, all variants from VN1000G Project are normalized and split into bi-allelic.

25. Are all the variants displayed on the VN1000G Project discovered by the project?

No, not all the variants in the browsers produced by the VN1000G Project were discovered by the project. The data from the project are based on custom versions of the Ensembl browser. These databases contain the Ensembl core features (genes and transcripts), regulatory elements from the Ensembl Regulatory Build and variation data from the Ensembl Variation database. Ensembl variation contains data from dbSNP, ClinVar, COSMIC, dbGaP, dbVAR, EGA and many other sources.

26. Are there any FASTA files containing VN1000G variants or haplotypes?

We do not provide FASTA files annotated for 1000 Genomes variants. You can create such a file with a VCFtools Perl script called vcf-consensus.

An example set of command lines would be:

You can get more support for VCFtools on their help mailing list.

27. Are there any scripts or APIs for use with the VN1000G datasets?

Our data is in standard formats like SAM and VCF, which have tools associated with them. To manipulate SAM/BAM files look at SAMtools for a C based toolkit and links to APIs in other languages. To interact with VCF files look at VCFtools which is a set of Perl and C++ code.

28. Are there any statistics about how much sequence data has been generated by the VN1000G Project?

This feature is currently under development.

29. Can I get image files for any of the VN1000G sequencing runs?

Unfortunately, you cannot.

30. Can I get phenotype, gender and family relationship information for the VN1000G samples?

All samples from VN1000G are unrelated, phenotype and gender information are publicly available.

31. Can I map the variant coordinates between different genome assemblies?

Currently, VN1000G database only supports GRCh38 (hg38).

32. Can I use the VN1000G data for imputation?

This feature is currently under development.

33. Do I need permission to use the data of VN1000G in my own scientific research?

Yes, you can contact vingen@vinbigdata.org to request permission for that.

34. How are your alignments generated?

The raw input FASTQ files are cleaned through standard procedure to remove bad reads. Then we align the cleaned fastq files to hg38 using the standard bwa-mem with ReadGroup information for further tracking. The raw bam files are then marked duplicate and recalibrate Base Quality Scores following GATK best practice pipelines to the analysis-ready bam file.

35. How can I get the allele frequency of my variant?

The allele frequency of a variant is available through searching on the portal.

36. How many individuals were sequenced?

1050 samples are supposed to be sequenced at the end of the project.

37. How much sequence data has been generated for single individuals?

Each individual is sequenced once with target depth of 30x.

38. Is there any functional annotation for the data of VN1000G?

Yes, the variants in VN1000G project are functionally annotated by VEP. Annotation with other tools will be provided soon.

39. Is there any gene expression data available for the VN1000G samples?

No, that type of data is currently not available for the VN1000G samples

40. What sequencing platforms were used for the VN1000G project?

We used the NovaSeq 6000 platform of Illumina.

41. What format are your sequence files?

The sequence files are available in FASTQ format

42. What is a gene panel?

Gene panel is a collection of genes. Normally, each gene panel is targeted to a particular disease or a pathway. We provide gene panel-level query to help researchers narrow down the number of variants.

43. What library insert sizes were used in the VN1000G project?

The library insert sizes of 400 bp were used in the VN1000G project.

44. What percentage of the genome is assayable?

The coverage of reference genome is over 98%.

45. What read lengths were used by the project?

The average read length of the project is 150bp

46. What strand are the variants in your VCF file on?

The variants in our VCF file are on both strands.

47. What structural variant data is available for the project?

Structural variant calling is in progress.

48. What was the source of the DNA for sequencing?

The source of DNA is obtained from frozen whole blood.

49. Where are your alignment files located?

The alignment files are located under Files section.

50. Where are your variant files located?

The variant files are located under Files section.

51. Where can I get consequence annotations for the variants?

Consequence annotations by VEP is available in the variants information access page.