Nintendo - Nintendo DSi (Digital) (CDN) dat notes

From No-Intro ~ Wiki
Revision as of 18:46, 20 November 2020 by Zedseven (talk | contribs) (The notes and additional information related to the dat of Nintendo CDN DSiWare files.)
Jump to navigation Jump to search

Background Information

--THIS IS A DRAFT FOR THE SOON-TO-BE-RELEASED DAT--

The Nintendo - Nintendo DSi (Digital) (New) dat is the work of Hiccup and zedseven, with additional work by Larsenv and data sourced from Galaxy. It was created by a tool written specifically for this purpose, NUS Ripper.

The dat includes all known CDN files for DSiWare, including:

  • Metadata files (tmd)
  • Tickets (cetk)
  • Encrypted content
  • Decrypted content (verified by hash with hashes contained in metadata)
  • DSi Shop data (FFFD0000 and up):
    • Icons
    • Screenshots
    • Zipped HTML manuals for system titles
  • All content ID 'versions' of above files:
    • FFFEFFFF and down - same as metadata
    • FFFFFFFD - same as ticket, if available
    • FFFFFFFE - 0-byte file that always exists for every title


Field Sources

This is a breakdown of how each field in the dat is sourced, with relevant information if necessary.

The format for the following section is:

tag

  • field: Explanation & source.


game

  • name: The No-Intro title of the ROM.

archive

  • name: Same as game name.
  • number: The archive number for the dat. Used primarily for parent-clone relationships.
  • region: Determined based on the last character of the game code (last two of the title ID).
  • languages: Determined based on a complicated method that is detailed in #Language Determination Method.
  • special1: Set to "System" if the first character of the game code is an 'H' ('K' is for normal games, 'H' is for system titles). Otherwise, it's left blank.
  • special2: Set to "Removed" if the title was removed from Nintendo's CDN at the time of dat creation.
  • clone: Set to 'P' if the title is considered the parent of the regional releases, otherwise it's set to the archive number of the parent title. The parent is determined as follows: the regional releases are ordered based on whether or not they contain En in their supported languages, then by how many languages they support, then by a general list of region order designed to prioritize regions with greater global/cultural 'coverage'. The first in the ordered list is then chosen. This is designed to assign the parent release as the one with the most canonical 'coverage', spanning the most languages. The reason that the presence of English is a factor is simply because No-Intro is generally English-speaking, so it makes sense for 1G1R set collectors.

source

  • Each title has multiple source tags. This is because files are grouped by dump date as well as by dumper. This allows the dat to fully capture when each file was dumped and datted, and more importantly, captures the work done by each person in the project.

details

  • dumpdate: The date this group of files was downloaded and/or decrypted. It applies to every child rom tag.
  • knowndumpdate: This is set to '1' for every entry by both zedseven and Galaxy, but the dump dates for Larsen's entries aren't known for sure, and as such his are set to '0'.
  • releasedate: Set to the same thing as dumpdate. This isn't known, so the value acts as a default.
  • knownreleasedate: Set to '0'.
  • dumper: Set to the username of the person that source tag is for. The entries with dumper="zedseven" contain the most data, but it was important to include the work of Galaxy and Larsen, since their work helped in the creation of the dat.
  • tool: Set to "NUS Ripper vX.X.X" in the case of dumper="zedseven", and "Custom" otherwise.
  • origin: Always set to "CDN", since the source for all of the information is Nintendo's CDN.

serials

  • digitalserial1: The title ID of the title. (16 digit hex number)
  • digitalserial2: The game code of the title, calculated from the last 8 digits of the title ID (each two digits are an ASCII character). The game code is what would be printed on the back of a cartridge - a 4-character code consisting of letters and numbers, that contains information on the type of title it is, and what region the release is for.

rom

  • forcename: The proper name for the file. This is the filename used to access the file on Nintendo’s CDN, and in the case of decrypted files, the filename of the source file that was decrypted.
  • extension: The file extension we suggest the file should have for file management and ease of use. This also indicates what type the file is, in the case of the ID files where the name is not very descriptive.
  • item: Set to "Main Content" for all main files (metadata/tmd, ticket/cetk, title content), and set to "Miscellaneous Content" for everything else. This is because the dat includes all known hidden content ID forms of files too, which are good for archival and preservation, but not very useful to people who just want to maintain ROM sets.
  • date: The date the file was created. For everything except decrypted content, this date is the date from the Last-Modified header straight from Nintendo’s CDN. This means the field contains the date the file was added to Nintendo’s servers. For decrypted files, this field is empty.
  • format: Set to "CDN" for all content downloaded straight from the CDN. For decrypted contents, it is set to "CDNdec". For each decrypted content file, there is a copy of the CDN version of the related metadata and ticket, both with their formats also set to "CDNdec". This is so one can download a purely usable dat of decrypted contents, and their relevant files to go along with them.
  • version: Set to the version the file is for, if such information is relevant. For instance, all metadata and ticket files include a version in them. This is where the version comes from, and the information is included in "raw,pretty" format - each Nintendo version can actually be broken down into a ‘pretty’ version that is of the form "vX.X.X".
  • size: File size in bytes, calculated at time of dat creation.
  • crc: CRC32 hash of file contents.
  • md5: MD5 hash of file contents.
  • sha1: SHA1 hash of file contents.
  • sha256: SHA256 hash of file contents.
  • serial: For title content files, this is set to the game code, though this game code is sourced from the actual ROM, not the title ID. There should never be a difference, but if there was, it’d be captured this way. This also contains the 12-character internal name of the title - often one or two words, acting as a kind of short title. The game code and internal title are separated by a comma.


Language Determination Method

In the No-Intro dat, each title has a list of supported languages. These languages were automatically determined. This section will detail precisely how they were determined, as well as how to interpret the additional fields related to it.

For each ROM, the tool checks to see if it has an existing 3DS port. A lot of DSiWare titles were later ported over to the 3DS eShop, and the 3DS eShop often contains data on what languages a game supports. If the title has an existing 3DS port and the eShop has language information for it, the tool stops here (as it has a canonical list of supported languages for the title).

If it was unable to find that information, it then moves on to ‘guessing’ the supported languages from the titles (system menu names) contained in the ROM. Nintendo DS games, and by extension, DSiWare, contain Japanese, English, French, German, Italian, Spanish, Chinese, and Korean titles. Some or all of them can be populated, each with the game’s title localized to the language. For languages a game doesn’t support, the title will either be empty or a duplicate of the title from the game’s primary language (typically English, but not always).

The way it figures out what languages a ROM supports from this is as follows - for the sake of example, let's say our ROM has De and Fr titles, and the rest of the titles are duplicates of the En title:

It goes through the list of titles, and any titles that are unique are considered supported languages. In this example, both De and Fr titles will be unique, as they are in their respective languages, while the rest of the titles will be En. This allows the tool to say with relative certainty that De and Fr are supported languages, since they had their own proper localizations of titles.

Since the remaining titles are all copies of the En title, it’s impossible for it to know which is the primary title the others were copied from. This is where NTextCat comes in. NTextCat is a library designed to guess what language a piece of text is in. The tool of course sanitizes the title (removing characters and strings that serve no purpose in identifying the language) before passing it to NTextCat, to help prevent false positives and errors. Despite this, the best NTextCat can do is guess - it’s almost always correct for languages with different character sets (Ja, Zh, Ko), but it isn’t always right for Latin-alphabet languages. Part of the problem is since these are game titles, many are only 2-3 words and/or include made-up words or combinations thereof (ie. "Easter Eggztravaganza"). It certainly isn’t optimal conditions for NTextCat to do it’s best.

Because of this, the tool does more work with NTextCat’s guess. The identifier doesn’t just give one answer, it gives all the languages it thinks the title could be in, in order of probability (the language it thinks is most likely is first). That list is then filtered to remove languages that have already been added, ones that make no sense for the game’s region, etc. It then chooses the most likely language that NTextCat thought was possible that is also an expected language for the region. In the case of system titles (title ID first 8 characters not being equal to "00030004"), languages that are not an option for the region (in the DSi settings) are removed as well.

Despite all the logic behind it, at the end of the day it remains an educated guess. For this reason, the No-Intro entry has two additional pieces of information that go alongside the language info. The first is the Language Determination Method, which simply states whether the information was gleaned from the 3DS eShop page of an official 3DS port of the game, or whether it was guessed from the ROM titles as described above. The second, in the case of the ROM title guessing, is the Nebulously-Determined Languages. This tells you the languages that were determined by NTextCat, as they have a much higher chance of being incorrect than ones that existed as unique titles. This is to hopefully make it easier for errors to be found and corrected in the future, after the dat has been added.

If this explanation wasn't enough or you want to see the code for yourself, it can be found here. The relevant section is under the // Languages header.


No-Intro Titles

The method zedseven used to create the No-Intro titles for all 1725 titles is as follows:

First, NUS Ripper takes the title from the primary language of the game. It strips off the publisher and replaces any other line breaks with a hyphen. It follows the No-Intro Convention, so all non-Latin text is romanized, and only the allowed characters are kept.

The romanization is done using a separate library I wrote for the project, Romanization.NET.

  • As per the convention, Japanese is romanized using the Modified Hepburn system.
  • Chinese is romanized using the Hànyǔ Pīnyīn system, as it is the standard for China, and there is no system specified in the convention.
  • Korean is romanized using the Revised Romanization of Korean system - this system is the standard romanization system for South Korea, and again, there is no specified system to use in the convention.

The tool followed the above steps and produced a simple CSV file.

The problem is, Japanese and Chinese often don't use spaces the way Latin languages do, and so when romanized, simply become a giant string of characters. This, among a few other small things, is why I still went over everything manually afterwards.

The CSV file was then carefully pored over and corrected/cleaned as best as possible. At the time of writing I do not know Japanese, Chinese, or Korean, so while I did my best and used translation tools to understand where word boundaries were, I can make no guarantee that every title will be flawless.

Outliers

  • The Chinese Game & Watch games (000300044B474243, 000300044B474343, 000300044B474443, 000300044B474743, 000300044B474843, 000300044B474A43, 000300044B474D43, 000300044B475643) all have the same title ("GAME & WATCH"), and so they have the subtitle of the English release appended so it's possible to tell them apart.
  • The system menu titles (00030017484E41**) are all just titled "NINTENDO DS" - because that doesn't accurately describe what they are, they have been renamed "System Menu".
  • The non-executable contents (0003000F484E4341, 0003000F484E4841, 0003000F484E4C**) had no titles to use (since they literally aren't NDS ROMs), and as such they have been given the names as listed on DSiBrew.


Additional Info

Metadata files (tmd) all contain SHA1 hashes for each content file in their version. These hashes are the hashes of the decrypted contents - all decrypted contents (nds) have been verified to match the hashes in their respective tmds. This effectively verifies all decrypted and encrypted game content files to be correct.


Additional Tables

Region Code to No-Intro Region Map

Much of this was sourced from the last part about the fourth character in NDS game codes at GBATEK, though with some adjustments.

Code Region Name
E USA
J Japan
P Europe
U Australia
K Korea
V Europe, Australia
C China
D Germany
F France
I Italy
S Spain
O USA, Europe
X Europe
T USA, Australia
H Netherlands
A World

Parent Determination Region Order

Keep in mind this order is used only to break ties when sorting by language count is not enough.

Code Region Name
A World
V Europe, Australia
P Europe
O USA, Europe
T USA, Australia
E USA
U Australia
X Europe
F France
D Germany
S Spain
I Italy
H Netherlands
J Japan
K Korea
C China

Extension Types

Content Type Extension
Encrypted content bin
Decrypted executable ROMs nds
Decrypted content of some other type (ie. Nintendo DS Cart Whitelist) bin
Metadata (including special content ID "form") tmd
Ticket (including special content ID "form") tik
DSi Shop content (based on file magic) gif, bmp, zip
Unknown empty special content ID, and everything else bin