Merge Duplicate folders in SharePoint with PowerShell

A reader of this blog reached out to me if I knew a way to merge duplicate folders in SharePoint Online. Users had already worked in the duplicate folders, so we not only needed to merge the folders but also preserve the last modified file.

All duplicate files or folders had the same pattern, (1), in their file name. So this made it really easy to find the files and folders and merge them with the original location.

In case you are wondering what happened; An user deleted files locally without stopping the syncing first. Because of the large number of files that were deleted (over the 750.000), they asked Microsoft to restore it, which they did.

Only due to a communication error, the restore job was done in two parts, resulting in duplicate files. Microsoft was unable to assist with the cleanup, so I stepped in.

This is one of those situations where a third-party backup solution would really be a lifesaver. I have written before about it; Do you need a backup solution for Microsoft 365. As you can see in this case, the files can be restored, but having a backup solution would make restoring so much easier.

Finding the duplicate items

Merging the duplicate folders and files will take a couple of steps. The duplicate folders are not only at the root level of the document library but can also be 3 levels deep in a subfolder. So we need to work recursively through all the folders, looking for items with (1) in the name.

I have broken down the script in a couple of steps, each translated to there own function:

  • Finding the duplicate files and folders
  • Create the target path (the original location)
  • Move the folder content
  • Compare files dates
  • Move a single item

Connect to the SharePoint site

Before we can do anything we need to connect to the SharePoint site. We can do this simply with PnPOnline, and I am using the web login switch here so we can use our normal login with MFA here.

At the end of the article I will show the complete script.

# SharePoint url
$sharepointUrl = 'https://lazyadmin.sharepoint.com/'

# Site
$site = 'sites/lab01'

# Library name
$libraryName = 'Duplicates'

# Login 
$url = $sharepointUrl+ '/' + $site
Connect-PnPOnline -Url $url -UseWebLogin

Finding the duplicate files

Because we need to process the subfolders as well, the trick here is to go through the folders recursively. We use the Get-PnPFolderItem function here, but then in a recursive way. The function is based on a script from Josh Einstein that you can find here.

# Recursively calls Get-PnpFolderItem for a given Document Library
# Based on: https://gist.github.com/josheinstein/3ace0c9f8e25d07583ceb57d13f71b2e

Function Get-PnpFolderItemRecursively($FolderSiteRelativeUrl) {
    
    # Get all items
    $items = @(Get-PnPFolderItem -FolderSiteRelativeUrl $FolderSiteRelativeUrl)

    Foreach ($item in $items) {

        # Strip the Site URL of the item path because Get-PnpFolderItem wants it
        # to be relative to the site, not an absolute path.

        $itemPath = $item.ServerRelativeUrl -replace "^$(([Uri]$item.Context.Url).AbsolutePath)/",''
 
        # Check if the item is a folder
        If ($item -is [Microsoft.SharePoint.Client.Folder]) 
        {
            
            # Check if the folder name contains (1) on the end
            # If - if the folder name contains a (1) on the end, then it's a duplicate folder that we need to move or merge
            # Else - if the folder doesn't contain (1), then we open the folder and search through the next level

            if ($item.name  -like "*(1)") 
            {
         
                # Duplicate folder found
                Write-Host " - Duplicatie folder found: " $itemPath -ForegroundColor Yellow
            
                # Move the content of the folder to the original location
                Move-FolderItemsRecursively($itemPath)
            }
            else
            {
                # Is doesn't contain (1), but it's a folder, search through the next level by recursing into this function.
                Get-PnpFolderItemRecursively $itemPath
            }
        }
        else
        {
            # Item is a file
            # Check if items name contains a (1), if true, move the file

            if ($item.name  -like "*(1)") 
            {
                $targetPath = Create-TargetPath -itemPath $itemPath -targetPath $item["FileRef"].trim("*(1)") -relativePath $relativePath

                Write-Host $newTargetPath;

                Move-CustomItem -SiteRelativeUrl $itemPath -targetPath $targetPath -item $item
            }
            # Else skip to next
        }
    }
}

So for every item we check if it’s a folder. When it’s a folder, we check the folder name if it contains (1). If so, we are moving the folder content, if not, we “open” the folder, and go through the content.

If it’s not a folder, but a file, then again we check the name. If it contains a (1), then we are going to move the file else we do nothing.

Creating the target path

If we find a duplicate item, we need to create the original path from the items path. We also need to check if the original folder exists, and if not re-create the original folder.

Function Create-TargetPath {
    [CmdletBinding()]
    param(
         [parameter (Mandatory=$true)]
         $itemPath,

         [parameter (Mandatory=$true)]
         $item,

         [parameter (Mandatory=$false)]
         $relativePath
     )

    process
	{
        # Build new path
        $path = $itemPath.replace($item.name,'') 
        $targetPath = "/" + $site + "/" + $path + $item.name

        if ($whatIf -ne $true)
        {
            # Check if target folder exists, create if necessary
            Write-host ' - Check if target folder exists' $path.replace('(1)', '') -BackgroundColor DarkMagenta;
            $result = Resolve-PnPFolder -SiteRelativePath $path.replace('(1)', '') -ErrorAction SilentlyContinue
        }
        else{
            Write-host ' - Create target folder if it does not exists' $path.replace('(1)', '') -BackgroundColor DarkMagenta;
        }

        Write-Output $targetPath.replace('(1)', '')
    }
}

What you will see in each function is that I am using are variable $whatIf. At the beginning of the script, I have declared this variable. This allows me to test the script without actually moving or creating any files or folders.

Most PnP functions do support the WhatIf switch, but for example, the Resolve-PnPFolder doesn’t support it. So this way I can simply test it by writing to the console what the script would do.

So the create-targetPath function will recreate the path, and check if the folder exists and if not recreate it with the Resolve-PnPfolder function.

Move folder and subfolders with PowerShell

When we have found a duplicate folder, we want to move the folder and all the content (including subfolders) to the original location. For each subfolder we need to create a target path as well.

Function Move-FolderItemsRecursively($FolderSiteRelativeUrl) {

    # Get all items in this sub folder
    $items = @(Get-PnPFolderItem -FolderSiteRelativeUrl $FolderSiteRelativeUrl)

    foreach ($item in $items) {

        # Strip the Site URL off the item path, because Get-PnpFolderItem wants it
        # to be relative to the site, not an absolute path.
        
        $itemPath = $item.ServerRelativeUrl -replace "^$(([Uri]$item.Context.Url).AbsolutePath)/",''

        # If this is a directory, recurse into this function.
        # Otherwise, build target path and move file

        if ($item -is [Microsoft.SharePoint.Client.Folder]) 
        {
            Move-FolderItemsRecursively $itemPath
        }
        else 
        {
            $targetPath = Create-TargetPath -itemPath $itemPath -item $item
            
            Move-CustomItem -SiteRelativeUrl $itemPath -targetPath $targetPath -item $item
        }
    }
}

Comparing files dates in SharePoint Online and moving the files

So we now come to the part what it’s all about, moving the actual files. Before we can move the file, we need to check if the file exists in the orignal location. If it does, we need to compare the file dates, we want to keep the last modified file in this case.

# Move file to original folder
Function Move-CustomItem  {
    [CmdletBinding()]
    param(
         [parameter (Mandatory=$true)]
         $siteRelativeUrl,

         [parameter (Mandatory=$true)]
         $targetPath,

         [parameter (Mandatory=$true)]
         $item
     )

    process
	{
        $moveFile = Compare-FileDates -sourceFilePath $siteRelativeUrl -targetFilePath $targetPath;
		$global:moveLimitCounter++

        if ($moveFile -eq $true) 
        {

			if ($moveLimitCounter -eq $moveLimit)
			{
				Write-Warning 'Move limit reached'
				exit;	
			}

            if ($whatIf -ne $true)
            {
				# Move the file
				Write-host '   - Move item to' $targetPath -BackgroundColor DarkYellow;
				Move-PnPFile -SiteRelativeUrl $siteRelativeUrl -TargetUrl $targetPath -OverwriteIfAlreadyExists -Force:$force
				Write-Host "`r`n"
				
            }
            else
            {
                Write-host '   - Move file from' $siteRelativeUrl -BackgroundColor DarkCyan
				Write-host '     to' $targetPath -BackgroundColor DarkCyan
				Write-Host "`r`n"
            }
        }
    }    
}

I added a counter here that will count the amount of files that are moved. When you going to test script likes this, you may only want to move a couple of files first and then check the results before you continue.

# Check if the file already exists in the target location
# If the file exists, we need to compare the dates to keep the latest files

Function Compare-FileDates () 
{
    [CmdletBinding()]
    param(
         [parameter (Mandatory=$true)]
         $targetFilePath,

         [parameter (Mandatory=$true)]
         $sourceFilePath
     )

    $targetFileExists = Get-PnPFile -Url $targetFilePath -ErrorAction SilentlyContinue
    
    If($targetFileExists)
    {
        $sourceFile = Get-PnPFile -Url $sourceFilePath -AsListItem
        $targetFile = Get-PnPFile -Url $targetFilePath -AsListItem

        $sourceFileDate = Get-date $sourceFile['Modified']
        $targetFileDate = Get-date $targetFile['Modified']

        write-host ' - Comparing files dates: duplicate file: '$sourceFileDate 'original file: '$targetFileDate

        # Check if the source file is newer then the target file
        If ($sourceFile['Modified'] -gt $targetFile['Modified']) 
        {
            write-host '    - Duplicate file is newer, move the file' -BackgroundColor DarkGreen
            write-output $true
        }
        else
        {
			# Remove file
			if ($whatIf -ne $true)
            {
				Write-host '    - Target file is newer. Removing duplicate file' -BackgroundColor DarkRed
				Write-Host "`r`n"
				Remove-PnPFile -SiteRelativeUrl $sourceFilePath -Recycle -Force:$force
			}
			else
			{
				Write-Host 'Remove file' $sourceFilePath  -ForegroundColor Red
				Write-Host "`r`n"
			}
            write-output $false
        }
    }
    else
    {
        # Target file doesn't exists
        Write-host ' - Target file does not exist' -BackgroundColor DarkGreen
        Write-Output $true
    }
}

So we compare the dates, and if the original file is newer than we remove the duplicate file. In the remove file function, I added the switch -Recycle so we file will be moved to the recycle bin. You can remove this switch if you want to permanently delete the files.

Wrapping up

Good to know is that you may end up with some empty folders in your library. I have used a separate script that you can run to clean up the empty folders.

You can find the complete script here at my GitHub.

Always test these kind of scripts on a test SharePoint site with some test files. Make sure you set the whatif to $true and the force to $false when you start.

If you have any questions, just drop a comment below.

Get more stuff like this

IT, Office365, Smart Home, PowerShell and Blogging Tips

I hate spam to, so you can unsubscribe at any time.

8 thoughts on “Merge Duplicate folders in SharePoint with PowerShell”

  1. Hi All,

    I was able to resolve the issues.
    On the Authentication/Login section of the script there is an issue with the $url

    # Login
    $url = $siteUrl + ‘/’ + $site
    Connect-PnPOnline -Url $url -UseWebLogin

    the $url does not log in correctly. I manually entered the site URL here exactly as follows:
    “https://lazyadmin.sharepoint.com/”

    This resolved my 403 error.

  2. Hi Ruud,

    I am Hoping you can assist. I am having an error below. The only alterations I have made to the raw script from Github is changing the $siteUrl, $site and $libraryName

    The error I get is below.
    Get-PnPFolderItem : The remote server returned an error: (403) Forbidden.
    At line:52 char:16
    + … $items = @(Get-PnPFolderItem -FolderSiteRelativeUrl $FolderSiteRelat …
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : WriteError: (:) [Get-PnPFolderItem], WebException
    + FullyQualifiedErrorId : EXCEPTION,PnP.PowerShell.Commands.Files.GetFolderItem

    What I can confirm is that I did a standard login with PNP Online and was able to run the following command without a 403 error:
    Get-PnPFolderItem -FolderSiteRelativeUrl “SitePages”

  3. This looks great but am I too much of a noob to get it to work? I am trying to set it up in powershell ise and I cant authenticate.
    Connect-PnPOnline : GetContextAsync() called without an ACS token generator. Use GetACSAppOnlyContext() or
    GetAccessTokenContext() instead or specify in AuthenticationManager constructor the authentication parameters

    It doesnt seem to like connect-pnponline but I can connect in powershell using -interactive just fine.

    Any tips on solving the authentication issue?

      • Thanks for the reply Rudy, I did get it to work. -UseWebLogin would not work with MFA, -interactive does. I believe the ACS token error was because I had too many forward slashes when I checked the variables $url and $sharepointurl resulting in an invalid URL.

        I also didnt realize that the variable $libraryname needed to be a the real target sharepoint library name so that held me up for a bit.

        Now, how do I get rid of all these empty *(1) folders?

        Cheers,

  4. Hello Rudy,
    This is a great script! Thank you for posting it and the breakdown. 🙂

    I am having some problems with it though;
    Having uncommented “Write-Host ‘Processing folder:’ $itemPath” (line 65), I am getting a list of the top level Folders, but after each duplicate folder it says:
    You cannot call a method on a null-valued expression.
    At C:\Users\Me\Documents\RecursiveDedupII.ps1:97 char:17
    + … $targetPath = Create-TargetPath -itemPath $itemPath -targ …
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

    Move-CustomItem : Cannot bind argument to parameter ‘targetPath’ because it is null.
    At C:\Users\Me\Documents\RecursiveDedupII.ps1:101 char:72
    + … CustomItem -SiteRelativeUrl $itemPath -targetPath $targetPath -item $ …
    + ~~~~~~~~~~~
    + CategoryInfo : InvalidData: (:) [Move-CustomItem], ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Move-CustomItem

    $url & $libraryName come back correctly, but every other variable within “Functions” that I query comes back blank.

    If possible, aid in any form would be greatly appreciated.

  5. Hello Ruud,

    Any advice on modifying this script to avoid the dreaded “List view threshold” error? I see Get-PnPFolderItem doesn’t have a -RowLimit switch unfortunately 🙁 I have a huge Document Library that was recently compromised and almost a million files were deleted. Upon restoring in batches (using your work, Jose, and George’s scripts… thank you so much by the way), I see quite a few duplicate folders with (1) were created. I was hoping to use this script to merge the folders, but as usual the 5000 limit keeps getting hit.

    Any advice or help would be welcome! Thank you 🙂

    Get-PnPFolderItem : The attempted operation is prohibited because it exceeds the list view threshold.

    • That is strange. I have used this script for a case with 700.000 files and folders and didn’t had any problems running it. Are you running it as the tenant admin?

Leave a Comment

0 Shares
Tweet
Pin
Share
Share