Panduan

Best Practise for Research Data Management

Format Data

Memilih Format File
Mengubah Format File
Panduan Format File

Penting bagi Anda untuk memutuskan format apa yang akan dipilih untuk data penelitian Anda ketika Anda mulai merencanakan proyek penelitian Anda, karena format ini menentukan bagaimana data dapat digunakan, dianalisis, disimpan, dan digunakan kembali di masa depan.

Berikut beberapa pertanyaan yang mungkin perlu Anda pertimbangkan.

  • Jenis data apa yang akan dihasilkan?
  • Apakah Anda menggunakan format file yang standar untuk file Anda?
  • Apakah Anda menggunakan format file yang umum digunakan di area penelitian Anda?
  • Apakah format ini mudah dibagikan kepada kolega Anda atau orang lain yang memerlukan akses ke data?
  • Apakah format ini memfasilitasi penggunaan dan penggunaan kembali data Anda di masa mendatang (misalnya, standar terbuka/non-kepemilikan)?
  • Apakah ada kondisi khusus untuk membaca dan memanipulasi data penelitian Anda (misalnya, sistem operasi, perangkat lunak, atau alat)?

Anda disarankan untuk menyimpan salinan data dalam format file asli saat mengonversinya ke format file lain. File asli dapat digunakan untuk memperbaiki kerusakan yang tidak terduga selama konversi. Misalnya, konversi file mungkin memiliki risiko kehilangan informasi tertentu seperti yang tercantum di bawah ini:

  • Hilangnya konten (data)
  • Hilangnya karakteristik file yang disimpan di dalam file (metadata)
  • Hilangnya tata letak atau format (misalnya dalam file teks)
  • Penurunan kualitas (misalnya pada file grafik atau video)

Sumber: Research data management - looking after file formats, University of Amsterdam link (https://rdm.uva.nl/en/looking-after/file-formats/file-formats.html?cb)

Tabel berikut berisi panduan tentang format file yang direkomendasikan dan diterima yang diadopsi dari UK Data Archive. (https://ukdataservice.ac.uk/learning-hub/research-data-management/format-your-data/recommended-formats/ )

Type of data

Recommended formats

Other acceptable formats

Quantitative tabular data with extensive metadata.

A dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data.

Proprietary formats of statistical packages e.g. SPSS (.sav), Stata (.dta), .sas7bdat.

Delimited text and command (‘setup’) file (SPSS, Stata, SAS, etc.) containing metadata information.

Some structured text or mark-up file containing metadata information, e.g. DDI XML file.

SPSS portable format (.por).

MS Access (.mdb/.accdb).

Quantitative tabular data with minimal metadata.

A matrix of data with or without column headings or variable names, but no other metadata or labeling.

Comma-separated values (CSV) file (.csv).

Tab-delimited file (.tab).

Including delimited text of given character set with SQL data definition statements where appropriate.

Delimited text of given character set – only characters not present in the data may be used as delimiters (.txt).

Widely-used formats: MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), OpenDocument Spreadsheet (.ods).

Geospatial data.

Vector and raster data.

SRI Shapefile (essential – .shp, .shx, .dbf, optional – .prj, .sbx, .sbn).

Geo-referenced TIFF (.tif, .tfw).

CAD data (.dwg).

Tabular GIS attribute data.

ESRI Geodatabase format (.mdb).

MapInfo Interchange Format (.mif) for vector data.

Keyhole Mark-up Language (.kml).

Adobe Illustrator (.ai), CAD data (.dxf or .svg).

Binary formats of GIS and CAD packages.

Qualitative data.

Textual.

eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml).

Rich Text Format (.rtf).

Plain text data, ASCII (.txt).

Hypertext Mark-up Language (.html).

Widely-used formats: MS Word (.doc/.docx).

Some software-specific formats: NUD*IST, NVivo and ATLAS.ti.

Digital image data.

TIFF version 6 uncompressed (.tif).

Digital Imaging and Communications in Medicine (DICOM) (.dcm, .dcm30) – for CT/MRI data.

JPEG (.jpeg, .jpg) but only if created in this format.

TIFF (other versions) (.tif, .tiff).

Adobe Portable Document Format (PDF/A, PDF) (.pdf).

Standard applicable RAW image format (.raw).

Photoshop files (.psd).

BMP (.bmp) but only if created in this format.

PNG (.png) but only if created in this format.

Digital audio data.

Free Lossless Audio Codec (FLAC) (.flac).

MPEG-1 Audio Layer 3 (.mp3) if original created in this format.

Audio Interchange File Format (.aif).

Waveform Audio Format (.wav).

Digital video data.

MPEG-4 (.mp4).

OGG video (.ogv, .ogg).

motion JPEG 2000 (.mj2).

MOV (.mov)

Windows Media Video (WMV) (.wmv).

WebM (.webm).

Documentation and scripts.

ich Text Format (.rtf).

PDF/A or PDF (.pdf).

HTML (.htm).

OpenDocument Text (.odt).

R Markdown files (.rmd) (with HTML version as well).

Plain text (.txt).

Widely-used proprietary formats: MS Word (.doc/.docx), MS Excel (.xls/.xlsx).

XML marked-up text (.xml) according to an appropriate DTD or schema, e.g. XHMTL 1.0.