Monday, January 7, 2013

NTFS Triforce - A deeper look inside the artifacts

Hello Reader,
            In our last post we discussed at a high level the relationship between the $MFT, $LOGFILE and $USNJRNL. In this post we will go into detail of the structures we can recover from each of the three and how they link allowing us to determine the historical changes made to a file or directory.

$MFT  - The Master File Table is a pretty well understood artifact. MFT structures are fully documented and there are a variety of tools out there for parsing it. With that said, I'm not going into any depth on how the MFT works but instead just highlight the two structures we are interested in.

(Thanks to Mike Wilkinson for making these MFT data structure diagrams I am referencing below. You can find the full version of his NTFS cheat sheet here http://www.writeblocked.org/resources/ntfs_cheat_sheets.pdf)

The first is the File Record shown below:

When a file is created, modified, or deleted this is the structure that gets added, changed, or updated. The field in the upper right at offset 0x08 labeled $Logfile Sequence Number or LSN is how the MFT refers to the most recent change recorded in the $logfile. Each $logfile record has an associated LSN, however the LSN is updated in the file record to correspond to the most recent change. There is no record that I'm aware of that shows what LSNs a file record previously had. The MFT Record Number is a unique identifier for this file record, and if we have a way to link a change to  it then it becomes easy to associate historical changes we recover to indicate which MFT file record they are referencing.

The $USNJrnl keeps the MFT Record Number to indicate which file it is operating on and the Parent Record number to reflect what directory that MFT file record resided in. If a $logfile entry records a change then that change can be easily linked back to the MFT file record number's LSN if it's the last change made to that file record.

The file record however is not the only record/attribute we care about in the MFT for our triforce historical analysis powers, we also care a lot about the Standard Information record shown below:

If a time stamp, owner id or SID of a file changes then it's the standard information block/attribute that gets written to the $logfile and not the entire file record with all its attributes. This was a problem before we found the triforce linkage because as you can see the standard information block does not refer to the file record number. We had to determine which MFT entry a $logfile record was pointing to by either the LSN (which is captured in the Logfile header per recorded change) and hope it hasn't been updated again. Alternatively we could determine the location of the MFT entry by doing some math using the VCN (virtual cluster number) and the MFT cluster ID recorded in the $logfile. Relying on the physical location in the MFT is also problematic because a defrag can remove deleted entries and change the VCN where the entry resides leading to false positives of which $logfile record points to which MFT record.

The good news here is that as you can see at offset 0x40 the standard information attribute does record the update sequence number! The update sequence number in turn will point to the file record number and parent file record number as discussed above. This means that through the link between the $USNJrnl and the $MFT we can associate a change made to the standard information attribute from the $logfile to the $USNJrnl which links back to a specific $MFT file record number. This is a reliable identifier as the file  record number's value does not change based on system activity! This then leads us to the $logfile structures.



$Logfile - Every change recorded in the $logfile starts with a header as shown below:

The LSN here relates back to the file record entry inside the MFT for the change that is being recorded. The  LSN for a file record in the MFT will be updated to reflect the most current $logfile entry for that file. Meaning the LSN for a file will change with every change recorded. That means than any $logfile entry whose recorded change does not reference either the USN or the MFT record number can only have its corresponding MFT record determined by doing a calculation using the recorded VCN seen at offset 0x48 above.

Why does the $logfile record the VCN? The process of repairing the file system using the $logfile is to overlay the data stored in the $logfile over the areas where a transaction failed to complete successfully. This allows the file system to be rolled back (using the undo records) or have a change reapplied (using the redo records) by just overwriting what previously located at those VCNs.

What comes after the LSN record header will vary on what change took place, the $logfile is storing the raw MFT record/attribute that has been modified so any MFT entry could exist in the $logfile. We focus on the File records and the Standard information attribute records as they reveal the most about changes occurring to a file. There are other MFT records/attributes that could be of interest to you and they also exist in the $logfile. Any change made to a MFT record/attribute will be recorded in the $logfile, the hard part is then referencing that logged change to the actual MFT record being modified to know which file record it relates to. So you can imagine that after every LSN header you have a copy of the MFT record/attribute being changed reflecting its before (undo) and after (redo) states.

Since there are no other $logfile structures other than the LSN header, RCRD header and Restart areas we are reliant on what is being recorded by the MFT record being changed to exactly know which file is being modified. When we are lucky (like we are with file records and $standard_information records) we get a link back to a unique file reference number. When we are unlucky (resident data found in $DATA attribute records) we have to rely on some math using the VCN and MFT Cluster index stored in the LSN header to determine what location within the MFT the record is pointing to. It's this possibility for false positives that keeps these records out of the public version of our $logfile parser.

Note: We will go into even more detail of the $logfile structures when we do the big $logfile post which is coming with the tool release I promise.

$USNJrnl - The USNJrnl or Update Sequence Number Journal has a pretty simple structure compared to the rest we've talked about and is fully documented as shown below:

Sorry no fancy hex offset data structure for this yet, just the record structure as taken from Microsoft's documentation of the USNJrnl.  As you can see for purposes of linking back $standard_information structures stored in the $logfile to the MFT we have the matching USN stored here as the sixth element down. Since each USNJrnl entry, and thus each open/close of a file, has a unique USN assigned  we have a great lasting artifact to look for when trying to match $standard_information records back to MFT records. The fourth and fifth items in the record entry link back to the MFT for not only the file record number but also the directory the file was located in as seen in the parent record number.

Taken just on its own the USNJrnl is a fantastic source of historical information that more examiners are beginning to utilize, you can get even more information out if it by taking it a step further. If you were to mine out all the unique USN records into a database table you could group them by file reference number to see all the changes including the renaming of a file or its movement between directories.This is because the MFT file record number (shown in the MSDN screenshot above as a reference number) does not change no matter how many times the file record or attributes change. Renaming a file, moving a file, editing its time stamps, filling it with random data, etc... none of these actions will change the file record number. What utilizing the triforce gets us is more granular details of those attribute changes that only exist in the $logfile extrapolated out through the $USNJrnl to a MFT file record.

Putting it all together - So that was a lot of words up there, if you read the last post you got the same information at a very high level but now you can see at a much deeper level how these things sync up. I don't believe that the developers actually intended this relationship to exist, or else I would expect more syncing for more record types stored in the $logfile, we just again get a happy overlap between what a developer made and what analysis can reveal to us.

If you followed everything I wrote above you will see that using the power of the NTFS triforce we can recover and identify:
1. The change of ownership of a file ($logfile)
2. The change of a file's SID (if that were to happen)  ($logfile)
3. The changing of timestamps ($logfile)
4. The movement of files between directories ($logfile and $USNJrnl)
5. The renaming of files (common during wiping) ($logfile and $USNJrnl)
6. The summary of actions taken against a file ($USNJrnl)
7. The changing of attributes to a file, important for things like tracking hard links to determine CD Burns ($logfile)

We can do all of these with little chance of error thanks to the combination of these three data sources. Additionally we can recover granular historical changes to files. Depending on your location in the DFIR spectrum (from digital forensics analyst, incident responder to malware analyst or all of the above) you will have different uses for this information. We are very excited about thee triforce and we are extending our $logfile parser to include these sources, of which the $MFT integration was already on our roadmap. Getting the full use out all of this information will require a database and were not sure if SQLLite is up to the task, hope to have something workable out there soon.

In the next blog post I'll talk about how to get access to the $USNJrnl, $MFT and $Logfile from volume shadow copies as not all access methods are equal. After that I'll likely move into updating some old 'what did they take' posts to reflect new artifact sources and post the results of our forensic tool tests.


Friday, January 4, 2013

Happy new year, new post The NTFS Forensic Triforce

Feliz Nuevo Ano Reader!,
                                           Thanks for sticking with me and my erratic schedule through 2012. One of my resolutions for 2013 is to get better about regularly blogging and writing about new things we are seeing/doing. The new book is in copyedit at the moment, http://www.amazon.com/Computer-Forensics-Infosec-Guide-Beginners/dp/007174245X/ref=sr_1_21?ie=UTF8&qid=1357249339&sr=8-21&keywords=computer+forensics, not quite sure why they changed the title but it's supposed to be Computer Forensics, A beginner's guide. It's meant for those people who already in IT and moving into a DFIR role either within their company or on their own. I'm actually putting together a series of Youtube videos to go along with it, they'll be found here http://www.youtube.com/learnforensics and I'll be uploading some sample cases to work through that match the book. More on all of that when the book is released though!

Now for why you are (most likely) here, NTFS internal forensics. Over the past year or so if you've been reading or watching me speak you'll know that we've been focused on the $logfile. Since then we've expanded our research into other file systems (we have a working ext3/4 journal parser now that can recover deleted files names and re-associate them with their inodes/metadata) but we are always keeping an eye on NTFS to see what else we can do to expand our knowledge and capabilities. After re-evaluating the USN Journal thanks to Corey Harrell's blog (http://journeyintoir.blogspot.com/2013/01/re-introducing-usnjrnl.html) we've come to recognize that to get a bigger picture our view of previous file system activities can link up to form the NTFS Forensics TRIFORCE! It's dumb I know but it works, see the illustration below.


The $MFT, Master File Table, is always the primary indicator of what the current state of the file system is. If a defrag hasn't run, which windows 7 is very aggressive about and defaults to once a week now for auto defragging (mine is set for every Wednesday but I don't know if that's a default setting), then you can see the deleted/Inactive NTFS file/directory entries prior to the last defrag run. That's not enough for us as forensic investigators though, we need to know more about the prior states of the file system in order to perform our work. So how can we roll back time and see what happened before? A good answer in windows vista/7 systems for many has been shadow copies and shadow copies are amazing for the forensic investigator. What the MFT and the contents of the file system for a shadow copy show you though is the current state of the file system at the time it was capture, a snap shot of the file system. To know what actions took place between snap shots or before snap shots you have to look deeper. The two file system journals, and in this post we are just focusing on file system journals not forensic artifacts that are logging actions related to a specific user activity, that exist are the $logfile and the $USNJrnl.

The $logfile, the primary focus of our initial research, contains the before and afters or undo/redo for each change to the MFT. It contains the full entry of what is being changed (File records, $STDINFO blocks, etc...) and the metadata contained within them (with the exception of resident files on windows 7 journals whose contents appear to be nulled out). The $logfile is great for getting a very granualar level exactly what changes have occurred to a file system.

The $USNJrnl creates a summary of actions taken against a document from the time its opened to the time its closed. The $USNJrnl like the $logfile is circular being that overtime it will be overwritten but like the $logfile if you access the volume shadow copy using libvshadow you can get access to prior backups of it. The $USNJrnl keeps the name of the file being changed, the file id of the file being changed in the MFT, the Parent ID of the directory that contains the file in the MFT and the date the change occured as well as the USN which if the offset into the $USNJrnl where the data regarding the file begins.

Each of these data sources by themselves provide a wealth of information, but all are incomplete without each other. The MFT does not reflect past states, the $USNJrnl does not contain metadata and the $logfile does not always reference a file by name and file id. However, taken together they link up as seen in more detail below to create a view of historical actions like we've never been able to see before:
Ok so what does all that mean?
For any individual file we can determine the following:
     a. What changes have occurred to the file and when
     b. What metadata did the file have before and after each change (modification, creation, access, size, location)
     c. What was a file renamed to?
     d. What files previously existed in a directory?

Who is this useful too?
     1. Malware Analysts you can now see every file system change happening including anti-forensic attempts like time stamp alteration, deletion, renaming and overwriting
    2. Incident Responders you can do the same against attackers who are attempting to hide their activities and track what files they are accessing even if access dates are disabled
    3. Forensic Investigators  - so much more data regarding a suspects activities on the file system including the detection of spoliation
    4. Everyone!

What are the limitations?
   1. Both the $logfile and the $usnjrnl are circular with a max size, so there is a finite amount of data each will keep before it begins to overwrite themselves
   2. If you are using an operating system such as OSX/Linux using the ntfs 3G drivers for writing/accessing a NTFS volume they do not update the $logfile or $usnjrnl for their activities. It will update the $logfile to show that the file system was cleanly mounted though.
   3. Because the $USNJrnl keeps less data than the $logfile its like that the $usnjrnl will contain more historical data than the $logfile

What can you do to make this even better?
   1. If you analyze machines in your environment you can alter the sizes of the $usnjrnl and the $logfile so that they retain much more data.
   2. You can get a copy of our forthcoming NTFS-TRIFORCE parser which will take in data from all three sources to get a complete view of the file system.

More details you say? I'll do that in the next blog post (promise) it's taken too long to just write this one.