Merge Duplicate folders in SharePoint with PowerShell

A reader of this blog reached out to me if I knew a way to merge duplicate folders in SharePoint Online. Users had already worked in the duplicate folders, so we not only needed to merge the folders but also preserve the last modified file.

All duplicate files or folders had the same pattern, (1), in their file name. So this made it really easy to find the files and folders and merge them with the original location.

In case you are wondering what happened; An user deleted files locally without stopping the syncing first. Because of the large number of files that were deleted (over the 750.000), they asked Microsoft to restore it, which they did.

Only due to a communication error, the restore job was done in two parts, resulting in duplicate files. Microsoft was unable to assist with the cleanup, so I stepped in.

This is one of those situations where a third-party backup solution would really be a lifesaver. I have written before about it; Do you need a backup solution for Microsoft 365. As you can see in this case, the files can be restored, but having a backup solution would make restoring so much easier.

Finding the duplicate items

Merging the duplicate folders and files will take a couple of steps. The duplicate folders are not only at the root level of the document library but can also be 3 levels deep in a subfolder. So we need to work recursively through all the folders, looking for items with (1) in the name.

I have broken down the script in a couple of steps, each translated to there own function:

  • Finding the duplicate files and folders
  • Create the target path (the original location)
  • Move the folder content
  • Compare files dates
  • Move a single item

Connect to the SharePoint site

Before we can do anything we need to connect to the SharePoint site. We can do this simply with PnPOnline, and I am using the web login switch here so we can use our normal login with MFA here.

At the end of the article I will show the complete script.

# SharePoint url
$sharepointUrl = 'https://lazyadmin.sharepoint.com/'

# Site
$site = 'sites/lab01'

# Library name
$libraryName = 'Duplicates'

# Login 
$url = $sharepointUrl+ '/' + $site
Connect-PnPOnline -Url $url -UseWebLogin

Finding the duplicate files

Because we need to process the subfolders as well, the trick here is to go through the folders recursively. We use the Get-PnPFolderItem function here, but then in a recursive way. The function is based on a script from Josh Einstein that you can find here.

# Recursively calls Get-PnpFolderItem for a given Document Library
# Based on: https://gist.github.com/josheinstein/3ace0c9f8e25d07583ceb57d13f71b2e

Function Get-PnpFolderItemRecursively($FolderSiteRelativeUrl) {
    
    # Get all items
    $items = @(Get-PnPFolderItem -FolderSiteRelativeUrl $FolderSiteRelativeUrl)

    Foreach ($item in $items) {

        # Strip the Site URL of the item path because Get-PnpFolderItem wants it
        # to be relative to the site, not an absolute path.

        $itemPath = $item.ServerRelativeUrl -replace "^$(([Uri]$item.Context.Url).AbsolutePath)/",''
 
        # Check if the item is a folder
        If ($item -is [Microsoft.SharePoint.Client.Folder]) 
        {
            
            # Check if the folder name contains (1) on the end
            # If - if the folder name contains a (1) on the end, then it's a duplicate folder that we need to move or merge
            # Else - if the folder doesn't contain (1), then we open the folder and search through the next level

            if ($item.name  -like "*(1)") 
            {
         
                # Duplicate folder found
                Write-Host " - Duplicatie folder found: " $itemPath -ForegroundColor Yellow
            
                # Move the content of the folder to the original location
                Move-FolderItemsRecursively($itemPath)
            }
            else
            {
                # Is doesn't contain (1), but it's a folder, search through the next level by recursing into this function.
                Get-PnpFolderItemRecursively $itemPath
            }
        }
        else
        {
            # Item is a file
            # Check if items name contains a (1), if true, move the file

            if ($item.name  -like "*(1)") 
            {
                $targetPath = Create-TargetPath -itemPath $itemPath -targetPath $item["FileRef"].trim("*(1)") -relativePath $relativePath

                Write-Host $newTargetPath;

                Move-CustomItem -SiteRelativeUrl $itemPath -targetPath $targetPath -item $item
            }
            # Else skip to next
        }
    }
}

So for every item we check if it’s a folder. When it’s a folder, we check the folder name if it contains (1). If so, we are moving the folder content, if not, we “open” the folder, and go through the content.

If it’s not a folder, but a file, then again we check the name. If it contains a (1), then we are going to move the file else we do nothing.

Creating the target path

If we find a duplicate item, we need to create the original path from the items path. We also need to check if the original folder exists, and if not re-create the original folder.

Function Create-TargetPath {
    [CmdletBinding()]
    param(
         [parameter (Mandatory=$true)]
         $itemPath,

         [parameter (Mandatory=$true)]
         $item,

         [parameter (Mandatory=$false)]
         $relativePath
     )

    process
	{
        # Build new path
        $path = $itemPath.replace($item.name,'') 
        $targetPath = "/" + $site + "/" + $path + $item.name

        if ($whatIf -ne $true)
        {
            # Check if target folder exists, create if necessary
            Write-host ' - Check if target folder exists' $path.replace('(1)', '') -BackgroundColor DarkMagenta;
            $result = Resolve-PnPFolder -SiteRelativePath $path.replace('(1)', '') -ErrorAction SilentlyContinue
        }
        else{
            Write-host ' - Create target folder if it does not exists' $path.replace('(1)', '') -BackgroundColor DarkMagenta;
        }

        Write-Output $targetPath.replace('(1)', '')
    }
}

What you will see in each function is that I am using are variable $whatIf. At the beginning of the script, I have declared this variable. This allows me to test the script without actually moving or creating any files or folders.

Most PnP functions do support the WhatIf switch, but for example, the Resolve-PnPFolder doesn’t support it. So this way I can simply test it by writing to the console what the script would do.

So the create-targetPath function will recreate the path, and check if the folder exists and if not recreate it with the Resolve-PnPfolder function.

Move folder and subfolders with PowerShell

When we have found a duplicate folder, we want to move the folder and all the content (including subfolders) to the original location. For each subfolder we need to create a target path as well.

Function Move-FolderItemsRecursively($FolderSiteRelativeUrl) {

    # Get all items in this sub folder
    $items = @(Get-PnPFolderItem -FolderSiteRelativeUrl $FolderSiteRelativeUrl)

    foreach ($item in $items) {

        # Strip the Site URL off the item path, because Get-PnpFolderItem wants it
        # to be relative to the site, not an absolute path.
        
        $itemPath = $item.ServerRelativeUrl -replace "^$(([Uri]$item.Context.Url).AbsolutePath)/",''

        # If this is a directory, recurse into this function.
        # Otherwise, build target path and move file

        if ($item -is [Microsoft.SharePoint.Client.Folder]) 
        {
            Move-FolderItemsRecursively $itemPath
        }
        else 
        {
            $targetPath = Create-TargetPath -itemPath $itemPath -item $item
            
            Move-CustomItem -SiteRelativeUrl $itemPath -targetPath $targetPath -item $item
        }
    }
}

Comparing files dates in SharePoint Online and moving the files

So we now come to the part what it’s all about, moving the actual files. Before we can move the file, we need to check if the file exists in the orignal location. If it does, we need to compare the file dates, we want to keep the last modified file in this case.

# Move file to original folder
Function Move-CustomItem  {
    [CmdletBinding()]
    param(
         [parameter (Mandatory=$true)]
         $siteRelativeUrl,

         [parameter (Mandatory=$true)]
         $targetPath,

         [parameter (Mandatory=$true)]
         $item
     )

    process
	{
        $moveFile = Compare-FileDates -sourceFilePath $siteRelativeUrl -targetFilePath $targetPath;
		$global:moveLimitCounter++

        if ($moveFile -eq $true) 
        {

			if ($moveLimitCounter -eq $moveLimit)
			{
				Write-Warning 'Move limit reached'
				exit;	
			}

            if ($whatIf -ne $true)
            {
				# Move the file
				Write-host '   - Move item to' $targetPath -BackgroundColor DarkYellow;
				Move-PnPFile -SiteRelativeUrl $siteRelativeUrl -TargetUrl $targetPath -OverwriteIfAlreadyExists -Force:$force
				Write-Host "`r`n"
				
            }
            else
            {
                Write-host '   - Move file from' $siteRelativeUrl -BackgroundColor DarkCyan
				Write-host '     to' $targetPath -BackgroundColor DarkCyan
				Write-Host "`r`n"
            }
        }
    }    
}

I added a counter here that will count the amount of files that are moved. When you going to test script likes this, you may only want to move a couple of files first and then check the results before you continue.

# Check if the file already exists in the target location
# If the file exists, we need to compare the dates to keep the latest files

Function Compare-FileDates () 
{
    [CmdletBinding()]
    param(
         [parameter (Mandatory=$true)]
         $targetFilePath,

         [parameter (Mandatory=$true)]
         $sourceFilePath
     )

    $targetFileExists = Get-PnPFile -Url $targetFilePath -ErrorAction SilentlyContinue
    
    If($targetFileExists)
    {
        $sourceFile = Get-PnPFile -Url $sourceFilePath -AsListItem
        $targetFile = Get-PnPFile -Url $targetFilePath -AsListItem

        $sourceFileDate = Get-date $sourceFile['Modified']
        $targetFileDate = Get-date $targetFile['Modified']

        write-host ' - Comparing files dates: duplicate file: '$sourceFileDate 'original file: '$targetFileDate

        # Check if the source file is newer then the target file
        If ($sourceFile['Modified'] -gt $targetFile['Modified']) 
        {
            write-host '    - Duplicate file is newer, move the file' -BackgroundColor DarkGreen
            write-output $true
        }
        else
        {
			# Remove file
			if ($whatIf -ne $true)
            {
				Write-host '    - Target file is newer. Removing duplicate file' -BackgroundColor DarkRed
				Write-Host "`r`n"
				Remove-PnPFile -SiteRelativeUrl $sourceFilePath -Recycle -Force:$force
			}
			else
			{
				Write-Host 'Remove file' $sourceFilePath  -ForegroundColor Red
				Write-Host "`r`n"
			}
            write-output $false
        }
    }
    else
    {
        # Target file doesn't exists
        Write-host ' - Target file does not exist' -BackgroundColor DarkGreen
        Write-Output $true
    }
}

So we compare the dates, and if the original file is newer than we remove the duplicate file. In the remove file function, I added the switch -Recycle so we file will be moved to the recycle bin. You can remove this switch if you want to permanently delete the files.

Wrapping up

Good to know is that you may end up with some empty folders in your library. I have used a separate script that you can run to clean up the empty folders.

You can find the complete script here at my GitHub.

Always test these kind of scripts on a test SharePoint site with some test files. Make sure you set the whatif to $true and the force to $false when you start.

If you have any questions, just drop a comment below.

Get more stuff like this

IT, Office365, Smart Home, PowerShell and Blogging Tips

I hate spam to, so you can unsubscribe at any time.

Leave a Comment

0 Shares
Tweet
Pin
Share
Share