{"id":3397,"date":"2025-12-26T15:46:11","date_gmt":"2025-12-26T12:46:11","guid":{"rendered":"https:\/\/www.dchost.com\/blog\/setting-up-robots-txt-and-sitemap-xml-correctly-for-seo-and-hosting\/"},"modified":"2025-12-26T15:46:11","modified_gmt":"2025-12-26T12:46:11","slug":"setting-up-robots-txt-and-sitemap-xml-correctly-for-seo-and-hosting","status":"publish","type":"post","link":"https:\/\/www.dchost.com\/blog\/en\/setting-up-robots-txt-and-sitemap-xml-correctly-for-seo-and-hosting\/","title":{"rendered":"Setting Up robots.txt and sitemap.xml Correctly for SEO and Hosting"},"content":{"rendered":"<div class=\"dchost-blog-content-wrapper\"><div id=\"toc_container\" class=\"toc_transparent no_bullets\"><p class=\"toc_title\">\u0130&ccedil;indekiler<\/p><ul class=\"toc_list\"><li><a href=\"#Why_robotstxt_and_sitemapxml_Matter_for_Your_Site\"><span class=\"toc_number toc_depth_1\">1<\/span> Why robots.txt and sitemap.xml Matter for Your Site<\/a><\/li><li><a href=\"#robotstxt_and_sitemapxml_in_Plain_Language\"><span class=\"toc_number toc_depth_1\">2<\/span> robots.txt and sitemap.xml in Plain Language<\/a><ul><li><a href=\"#What_is_robotstxt\"><span class=\"toc_number toc_depth_2\">2.1<\/span> What is robots.txt?<\/a><\/li><li><a href=\"#What_is_sitemapxml\"><span class=\"toc_number toc_depth_2\">2.2<\/span> What is sitemap.xml?<\/a><\/li><\/ul><\/li><li><a href=\"#Step_1_Decide_Your_Crawl_Strategy_Before_Writing_a_Single_Rule\"><span class=\"toc_number toc_depth_1\">3<\/span> Step 1 \u2013 Decide Your Crawl Strategy Before Writing a Single Rule<\/a><ul><li><a href=\"#Pages_and_sections_you_usually_want_crawled\"><span class=\"toc_number toc_depth_2\">3.1<\/span> Pages and sections you usually want crawled<\/a><\/li><li><a href=\"#URLs_that_are_often_safe_or_recommended_to_block\"><span class=\"toc_number toc_depth_2\">3.2<\/span> URLs that are often safe (or recommended) to block<\/a><\/li><li><a href=\"#Critical_warning_robotstxt_vs_HTTP_authentication\"><span class=\"toc_number toc_depth_2\">3.3<\/span> Critical warning: robots.txt vs HTTP authentication<\/a><\/li><\/ul><\/li><li><a href=\"#Step_2_Building_a_Clean_robotstxt_With_Real_Examples\"><span class=\"toc_number toc_depth_1\">4<\/span> Step 2 \u2013 Building a Clean robots.txt (With Real Examples)<\/a><ul><li><a href=\"#Basic_structure\"><span class=\"toc_number toc_depth_2\">4.1<\/span> Basic structure<\/a><\/li><li><a href=\"#Typical_robotstxt_for_a_CMS_site_eg_WordPress\"><span class=\"toc_number toc_depth_2\">4.2<\/span> Typical robots.txt for a CMS site (e.g. WordPress)<\/a><\/li><li><a href=\"#Blocking_parameters_or_specific_bots\"><span class=\"toc_number toc_depth_2\">4.3<\/span> Blocking parameters or specific bots<\/a><\/li><li><a href=\"#What_you_should_almost_never_do\"><span class=\"toc_number toc_depth_2\">4.4<\/span> What you should almost never do<\/a><\/li><\/ul><\/li><li><a href=\"#Step_3_Creating_sitemapxml_the_Right_Way\"><span class=\"toc_number toc_depth_1\">5<\/span> Step 3 \u2013 Creating sitemap.xml the Right Way<\/a><ul><li><a href=\"#Basic_XML_structure\"><span class=\"toc_number toc_depth_2\">5.1<\/span> Basic XML structure<\/a><\/li><li><a href=\"#Sitemap_indexes_for_large_or_complex_sites\"><span class=\"toc_number toc_depth_2\">5.2<\/span> Sitemap indexes for large or complex sites<\/a><\/li><li><a href=\"#Generating_sitemaps_on_different_platforms\"><span class=\"toc_number toc_depth_2\">5.3<\/span> Generating sitemaps on different platforms<\/a><ul><li><a href=\"#WordPress\"><span class=\"toc_number toc_depth_3\">5.3.1<\/span> WordPress<\/a><\/li><li><a href=\"#Custom_PHP_Laravel_Symfony_or_static_sites\"><span class=\"toc_number toc_depth_3\">5.3.2<\/span> Custom PHP, Laravel, Symfony or static sites<\/a><\/li><\/ul><\/li><li><a href=\"#Common_sitemap_mistakes\"><span class=\"toc_number toc_depth_2\">5.4<\/span> Common sitemap mistakes<\/a><\/li><\/ul><\/li><li><a href=\"#Step_4_Where_and_How_to_Place_robotstxt_and_sitemapxml_on_Your_Hosting\"><span class=\"toc_number toc_depth_1\">6<\/span> Step 4 \u2013 Where and How to Place robots.txt and sitemap.xml on Your Hosting<\/a><ul><li><a href=\"#On_shared_hosting_with_cPanel\"><span class=\"toc_number toc_depth_2\">6.1<\/span> On shared hosting with cPanel<\/a><\/li><li><a href=\"#On_DirectAdmin_or_Plesk\"><span class=\"toc_number toc_depth_2\">6.2<\/span> On DirectAdmin or Plesk<\/a><\/li><li><a href=\"#On_a_VPS_or_dedicated_server_with_Apache\"><span class=\"toc_number toc_depth_2\">6.3<\/span> On a VPS or dedicated server with Apache<\/a><\/li><li><a href=\"#On_a_VPS_or_dedicated_server_with_Nginx\"><span class=\"toc_number toc_depth_2\">6.4<\/span> On a VPS or dedicated server with Nginx<\/a><\/li><li><a href=\"#Multiple_domains_and_addon_domains_on_the_same_hosting_account\"><span class=\"toc_number toc_depth_2\">6.5<\/span> Multiple domains and addon domains on the same hosting account<\/a><\/li><\/ul><\/li><li><a href=\"#Step_5_Advanced_Scenarios_Multilingual_Subdomains_and_Staging\"><span class=\"toc_number toc_depth_1\">7<\/span> Step 5 \u2013 Advanced Scenarios: Multilingual, Subdomains and Staging<\/a><ul><li><a href=\"#Subdomain_vs_subdirectory_and_its_impact_on_robotssitemaps\"><span class=\"toc_number toc_depth_2\">7.1<\/span> Subdomain vs subdirectory and its impact on robots\/sitemaps<\/a><\/li><li><a href=\"#Staging_sites_and_test_environments\"><span class=\"toc_number toc_depth_2\">7.2<\/span> Staging sites and test environments<\/a><\/li><li><a href=\"#Handling_multiple_sitemaps_across_domains\"><span class=\"toc_number toc_depth_2\">7.3<\/span> Handling multiple sitemaps across domains<\/a><\/li><\/ul><\/li><li><a href=\"#Step_6_Testing_Monitoring_and_Avoiding_Silent_SEO_Disasters\"><span class=\"toc_number toc_depth_1\">8<\/span> Step 6 \u2013 Testing, Monitoring and Avoiding Silent SEO Disasters<\/a><ul><li><a href=\"#Use_Google_Search_Console_and_Bing_Webmaster_Tools\"><span class=\"toc_number toc_depth_2\">8.1<\/span> Use Google Search Console and Bing Webmaster Tools<\/a><\/li><li><a href=\"#Test_robotstxt_with_crawler_tools\"><span class=\"toc_number toc_depth_2\">8.2<\/span> Test robots.txt with crawler tools<\/a><\/li><li><a href=\"#Check_your_server_logs_for_crawler_behavior\"><span class=\"toc_number toc_depth_2\">8.3<\/span> Check your server logs for crawler behavior<\/a><\/li><li><a href=\"#Common_silent_issues_to_monitor\"><span class=\"toc_number toc_depth_2\">8.4<\/span> Common silent issues to monitor<\/a><\/li><\/ul><\/li><li><a href=\"#Hosting_and_Infrastructure_Considerations\"><span class=\"toc_number toc_depth_1\">9<\/span> Hosting and Infrastructure Considerations<\/a><ul><li><a href=\"#Performance_sitemaps_on_busy_stores_and_portals\"><span class=\"toc_number toc_depth_2\">9.1<\/span> Performance: sitemaps on busy stores and portals<\/a><\/li><li><a href=\"#SSL_redirects_and_canonical_domains\"><span class=\"toc_number toc_depth_2\">9.2<\/span> SSL, redirects and canonical domains<\/a><\/li><\/ul><\/li><li><a href=\"#Putting_It_All_Together_A_Practical_Checklist\"><span class=\"toc_number toc_depth_1\">10<\/span> Putting It All Together: A Practical Checklist<\/a><\/li><li><a href=\"#Conclusion_Small_Files_Big_SEO_and_Hosting_Impact\"><span class=\"toc_number toc_depth_1\">11<\/span> Conclusion: Small Files, Big SEO and Hosting Impact<\/a><\/li><\/ul><\/div>\n<h2><span id=\"Why_robotstxt_and_sitemapxml_Matter_for_Your_Site\">Why robots.txt and sitemap.xml Matter for Your Site<\/span><\/h2>\n<p>On almost every new project we see at dchost.com, two tiny files quietly decide how well the website will be crawled and indexed: <strong>robots.txt<\/strong> and <strong>sitemap.xml<\/strong>. They are small, but they sit right at the intersection of SEO, hosting configuration and long\u2011term maintainability. A clean robots.txt prevents search engines from wasting time on junk URLs and private areas. A well\u2011structured sitemap.xml helps new and updated pages get discovered faster. Misconfigure them, and you can accidentally block your whole site from search, slow down indexing, or cause duplicate\u2011content headaches across domains and subdomains.<\/p>\n<p>In this guide, we will walk through <strong>exactly<\/strong> how to set up robots.txt and sitemap.xml on shared hosting, cPanel\/DirectAdmin, and <a href=\"https:\/\/www.dchost.com\/vps\">VPS<\/a> servers with Apache or Nginx. We will keep the focus practical: what to allow, what to block, where to upload, how to test, and how to adapt your setup for multilingual sites, separate blogs\/stores or staging environments. If you are planning a new launch, combine this guide with our <a href=\"https:\/\/www.dchost.com\/blog\/en\/yeni-web-sitesi-yayina-alirken-hosting-tarafinda-seo-ve-performans-kontrol-listesi\/\">new website launch checklist for hosting\u2011side SEO and performance<\/a> so your site starts life technically solid.<\/p>\n<h2><span id=\"robotstxt_and_sitemapxml_in_Plain_Language\">robots.txt and sitemap.xml in Plain Language<\/span><\/h2>\n<h3><span id=\"What_is_robotstxt\">What is robots.txt?<\/span><\/h3>\n<p><strong>robots.txt<\/strong> is a simple text file placed at the <strong>root<\/strong> of your domain, for example:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">https:\/\/example.com\/robots.txt<\/code><\/pre>\n<p>Search engine crawlers (&#8220;bots&#8221;) request this file before they start exploring your pages. Inside robots.txt you give rules like:<\/p>\n<ul>\n<li>Which folders or URL patterns should <strong>not<\/strong> be crawled<\/li>\n<li>Optional crawl\u2011delay for some bots<\/li>\n<li>Where your sitemap.xml file lives<\/li>\n<\/ul>\n<p>It is important to understand that robots.txt is <strong>not a security feature<\/strong>. It only tells <em>well\u2011behaved<\/em> crawlers what you prefer. Never rely on robots.txt to hide sensitive data; protect those with authentication and proper permissions on your hosting or VPS. If you want a refresher on hosting fundamentals, our article <a href=\"https:\/\/www.dchost.com\/blog\/en\/web-hosting-nedir-domain-dns-sunucu-ve-ssl-nasil-birlikte-calisir\/\">what is web hosting and how domain, DNS, server and SSL work together<\/a> is a good background read.<\/p>\n<h3><span id=\"What_is_sitemapxml\">What is sitemap.xml?<\/span><\/h3>\n<p><strong>sitemap.xml<\/strong> is an XML file (or a set of files) that lists important URLs for your site, typically including:<\/p>\n<ul>\n<li>Each page or post URL<\/li>\n<li>When it was last modified<\/li>\n<li>Optional <code>&lt;priority&gt;<\/code> and <code>&lt;changefreq&gt;<\/code> hints<\/li>\n<\/ul>\n<p>Typical location:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">https:\/\/example.com\/sitemap.xml<\/code><\/pre>\n<p>Search engines use sitemaps to:<\/p>\n<ul>\n<li>Discover new content faster<\/li>\n<li>Find pages not easily reachable via menus or internal links<\/li>\n<li>Understand how your site is structured (especially in large or multilingual setups)<\/li>\n<\/ul>\n<p>Sitemaps do <strong>not guarantee<\/strong> indexing, but they significantly improve discoverability and crawling efficiency when combined with a sensible robots.txt.<\/p>\n<h2><span id=\"Step_1_Decide_Your_Crawl_Strategy_Before_Writing_a_Single_Rule\">Step 1 \u2013 Decide Your Crawl Strategy Before Writing a Single Rule<\/span><\/h2>\n<p>Before opening a text editor, decide what you <strong>want<\/strong> bots to see versus what they should ignore. This is where SEO, information architecture and hosting structure meet.<\/p>\n<h3><span id=\"Pages_and_sections_you_usually_want_crawled\">Pages and sections you usually want crawled<\/span><\/h3>\n<ul>\n<li>Core public pages: home, category pages, product pages, blog posts, landing pages<\/li>\n<li>Pagination that adds value (e.g. \/blog\/page\/2\/) if your SEO strategy relies on it<\/li>\n<li>Language or regional versions, depending on your <a href=\"https:\/\/www.dchost.com\/blog\/en\/subdomain-mi-alt-dizin-mi-blog-magaza-ve-dil-surumleri-icin-seo-ve-hosting-karsilastirmasi\/\">subdomain vs subdirectory choice for SEO and hosting<\/a><\/li>\n<\/ul>\n<h3><span id=\"URLs_that_are_often_safe_or_recommended_to_block\">URLs that are often safe (or recommended) to block<\/span><\/h3>\n<ul>\n<li>Admin panels: <code>\/wp-admin\/<\/code>, <code>\/administrator\/<\/code>, <code>\/cp\/<\/code>, custom admin paths<\/li>\n<li>Internal search or filter URLs with many parameters: <code>?sort=<\/code>, <code>?filter=<\/code>, <code>?session=<\/code>, etc.<\/li>\n<li>Cart and checkout steps (SEO usually focuses on product\/category pages instead)<\/li>\n<li>Tracking or A\/B testing URLs like <code>?utm_source=<\/code> or <code>?variant=<\/code> (usually handled via canonical tags, but sometimes blocked for specific bots)<\/li>\n<li>Staging or test subdirectories like <code>\/staging\/<\/code>, <code>\/beta\/<\/code>, <code>\/old-site\/<\/code><\/li>\n<\/ul>\n<h3><span id=\"Critical_warning_robotstxt_vs_HTTP_authentication\">Critical warning: robots.txt vs HTTP authentication<\/span><\/h3>\n<p>If you run a <strong>staging<\/strong> site on the same hosting account, never rely only on robots.txt to keep it out of search. Use:<\/p>\n<ul>\n<li>HTTP authentication (.htpasswd on Apache, basic auth on Nginx), and\/or<\/li>\n<li>IP whitelisting or VPN access<\/li>\n<\/ul>\n<p>Robots.txt prevents polite bots from crawling; it does <strong>not<\/strong> prevent visitors or leaky links from exposing your test environment.<\/p>\n<h2><span id=\"Step_2_Building_a_Clean_robotstxt_With_Real_Examples\">Step 2 \u2013 Building a Clean robots.txt (With Real Examples)<\/span><\/h2>\n<h3><span id=\"Basic_structure\">Basic structure<\/span><\/h3>\n<p>The syntax is simple but very strict about spelling and placement. A minimal robots.txt that allows everything and references a sitemap looks like this:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">User-agent: *\nDisallow:\n\nSitemap: https:\/\/example.com\/sitemap.xml\n<\/code><\/pre>\n<p>Key points:<\/p>\n<ul>\n<li><code>User-agent<\/code> defines which crawler the rules apply to (e.g. Googlebot, Bingbot). <code>*<\/code> means &#8220;all bots&#8221;.<\/li>\n<li><code>Disallow<\/code> followed by nothing means &#8220;nothing is disallowed&#8221; \u2192 full access.<\/li>\n<li><code>Sitemap<\/code> can appear anywhere in the file and can be listed multiple times (for multiple sitemaps).<\/li>\n<\/ul>\n<h3><span id=\"Typical_robotstxt_for_a_CMS_site_eg_WordPress\">Typical robots.txt for a CMS site (e.g. WordPress)<\/span><\/h3>\n<p>Here is a practical example we often see on shared hosting or VPS setups for a WordPress site:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">User-agent: *\nDisallow: \/wp-admin\/\nAllow: \/wp-admin\/admin-ajax.php\n\nDisallow: \/?s=\nDisallow: \/search\/\n\nSitemap: https:\/\/example.com\/sitemap_index.xml\n<\/code><\/pre>\n<p>What this does:<\/p>\n<ul>\n<li>Blocks most of the admin area from crawlers<\/li>\n<li>Allows <code>admin-ajax.php<\/code> so some themes\/plugins can load content correctly<\/li>\n<li>Discourages crawling internal search result pages<\/li>\n<li>Points bots to the main sitemap index generated by an SEO plugin<\/li>\n<\/ul>\n<h3><span id=\"Blocking_parameters_or_specific_bots\">Blocking parameters or specific bots<\/span><\/h3>\n<p>If you see certain bots hammering your server or wasting crawl budget on low\u2011value URLs, you can add more targeted rules.<\/p>\n<p>Block a folder for all bots:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">User-agent: *\nDisallow: \/tmp\/\nDisallow: \/cache\/\n<\/code><\/pre>\n<p>Apply rules only to a specific bot:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">User-agent: BadBot\nDisallow: \/\n<\/code><\/pre>\n<p>Block specific parameter patterns (Google ignores <code>Crawl-delay<\/code> but Bing and some others respect it):<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">User-agent: *\nDisallow: \/*?session=\nDisallow: \/*&amp;sort=\n\nUser-agent: Bingbot\nCrawl-delay: 5\n<\/code><\/pre>\n<h3><span id=\"What_you_should_almost_never_do\">What you should almost never do<\/span><\/h3>\n<ul>\n<li><strong>Never<\/strong> write <code>Disallow: \/<\/code> for all user\u2011agents on a live site unless you intentionally want zero crawling.<\/li>\n<li>Do not block CSS\/JS that are needed for rendering; modern SEO evaluates how the page looks to users. If you later work on <a href=\"https:\/\/www.dchost.com\/blog\/en\/web-sitenizin-hizini-dogru-olcmek-gtmetrix-pagespeed-insights-ve-webpagetest-rehberi\/\">testing your website speed and Core Web Vitals correctly<\/a>, blocked assets will give you misleading results.<\/li>\n<li>Do not use robots.txt to &#8220;hide&#8221; passwords, database dumps, backups or logs; those files should not be web\u2011accessible at all.<\/li>\n<\/ul>\n<h2><span id=\"Step_3_Creating_sitemapxml_the_Right_Way\">Step 3 \u2013 Creating sitemap.xml the Right Way<\/span><\/h2>\n<h3><span id=\"Basic_XML_structure\">Basic XML structure<\/span><\/h3>\n<p>A simple sitemap with two URLs looks like this:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;\n&lt;urlset xmlns=&quot;http:\/\/www.sitemaps.org\/schemas\/sitemap\/0.9&quot;&gt;\n  &lt;url&gt;\n    &lt;loc&gt;https:\/\/example.com\/&lt;\/loc&gt;\n    &lt;lastmod&gt;2025-01-01&lt;\/lastmod&gt;\n    &lt;changefreq&gt;daily&lt;\/changefreq&gt;\n    &lt;priority&gt;1.0&lt;\/priority&gt;\n  &lt;\/url&gt;\n  &lt;url&gt;\n    &lt;loc&gt;https:\/\/example.com\/about\/&lt;\/loc&gt;\n    &lt;lastmod&gt;2025-01-05&lt;\/lastmod&gt;\n    &lt;changefreq&gt;monthly&lt;\/changefreq&gt;\n    &lt;priority&gt;0.5&lt;\/priority&gt;\n  &lt;\/url&gt;\n&lt;\/urlset&gt;\n<\/code><\/pre>\n<p>Mandatory tags:<\/p>\n<ul>\n<li><code>&lt;urlset&gt;<\/code> root element with proper namespace<\/li>\n<li><code>&lt;url&gt;<\/code> container for each URL<\/li>\n<li><code>&lt;loc&gt;<\/code> canonical URL<\/li>\n<\/ul>\n<p><code>&lt;lastmod&gt;<\/code>, <code>&lt;changefreq&gt;<\/code> and <code>&lt;priority&gt;<\/code> are optional hints. They should be <strong>realistic<\/strong>, not over\u2011optimistic (do not mark everything daily\/1.0).<\/p>\n<h3><span id=\"Sitemap_indexes_for_large_or_complex_sites\">Sitemap indexes for large or complex sites<\/span><\/h3>\n<p>If your site has more than ~50,000 URLs or you want to split content by type (posts, products, categories, languages), you can use a <strong>sitemap index<\/strong> file that points to multiple sitemaps:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;\n&lt;sitemapindex xmlns=&quot;http:\/\/www.sitemaps.org\/schemas\/sitemap\/0.9&quot;&gt;\n  &lt;sitemap&gt;\n    &lt;loc&gt;https:\/\/example.com\/sitemaps\/sitemap-pages.xml&lt;\/loc&gt;\n  &lt;\/sitemap&gt;\n  &lt;sitemap&gt;\n    &lt;loc&gt;https:\/\/example.com\/sitemaps\/sitemap-posts.xml&lt;\/loc&gt;\n  &lt;\/sitemap&gt;\n  &lt;sitemap&gt;\n    &lt;loc&gt;https:\/\/example.com\/sitemaps\/sitemap-products.xml&lt;\/loc&gt;\n  &lt;\/sitemap&gt;\n&lt;\/sitemapindex&gt;\n<\/code><\/pre>\n<p>In robots.txt you then reference only the index:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">Sitemap: https:\/\/example.com\/sitemap_index.xml\n<\/code><\/pre>\n<h3><span id=\"Generating_sitemaps_on_different_platforms\">Generating sitemaps on different platforms<\/span><\/h3>\n<h4><span id=\"WordPress\">WordPress<\/span><\/h4>\n<p>Most WordPress setups now use automatically generated sitemaps:<\/p>\n<ul>\n<li>Core WordPress sitemap (since 5.5): usually at <code>\/wp-sitemap.xml<\/code><\/li>\n<li>SEO plugins (Yoast, Rank Math, etc.) often use <code>\/sitemap_index.xml<\/code> and provide fine\u2011grained control<\/li>\n<\/ul>\n<p>On shared hosting or VPS with WordPress, we recommend:<\/p>\n<ul>\n<li>Use one reliable sitemap source (core or plugin); avoid multiple competing sitemaps.<\/li>\n<li>Exclude low\u2011value taxonomies (e.g. tags with little content) from the sitemap via plugin settings.<\/li>\n<li>Keep your database lean; an overloaded <code>wp_options<\/code> table can slow sitemap generation. Our guide on <a href=\"https:\/\/www.dchost.com\/blog\/en\/wordpress-veritabani-optimizasyonu-wp_options-ve-autoload-sismesini-temizleme-rehberi\/\">WordPress database optimization and cleaning wp_options\/autoload bloat<\/a> is very helpful if sitemaps feel slow.<\/li>\n<\/ul>\n<h4><span id=\"Custom_PHP_Laravel_Symfony_or_static_sites\">Custom PHP, Laravel, Symfony or static sites<\/span><\/h4>\n<p>For custom apps hosted on VPS or shared hosting, you have two main options:<\/p>\n<ol>\n<li><strong>Static sitemap.xml<\/strong> generated periodically by a script or build pipeline.<\/li>\n<li><strong>Dynamic sitemap endpoint<\/strong> that queries your database and outputs XML on the fly.<\/li>\n<\/ol>\n<p>Static sitemaps are simpler and lighter on resources; you can regenerate them via a cron job when content changes. Dynamic sitemaps can stay perfectly up\u2011to\u2011date but require careful caching and database indexing (especially on large catalogs or marketplaces; see our article on <a href=\"https:\/\/www.dchost.com\/blog\/en\/buyuk-katalog-ve-marketplace-siteleri-icin-arama-altyapisi-vps-kaynak-planlama-ve-hosting-secimi\/\">search infrastructure and hosting choices for large catalog sites<\/a> for broader architecture tips).<\/p>\n<h3><span id=\"Common_sitemap_mistakes\">Common sitemap mistakes<\/span><\/h3>\n<ul>\n<li>Including URLs that return 404, 410 or redirect (3xx). Every entry should ideally be a 200 OK canonical URL.<\/li>\n<li>Listing both HTTP and HTTPS versions after a full HTTPS migration; use only one canonical scheme. If you are planning such a migration, read our <a href=\"https:\/\/www.dchost.com\/blog\/en\/httpden-httpse-gecis-rehberi-301-yonlendirme-hsts-ve-seoyu-korumak\/\">full HTTP\u2192HTTPS migration guide with 301 redirects and HSTS<\/a>.<\/li>\n<li>Exposing staging or private sections through the sitemap even though they are blocked in robots.txt.<\/li>\n<li>Letting sitemap URLs grow far beyond 50,000 or 50MB without splitting into multiple sitemaps.<\/li>\n<\/ul>\n<h2><span id=\"Step_4_Where_and_How_to_Place_robotstxt_and_sitemapxml_on_Your_Hosting\">Step 4 \u2013 Where and How to Place robots.txt and sitemap.xml on Your Hosting<\/span><\/h2>\n<h3><span id=\"On_shared_hosting_with_cPanel\">On shared hosting with cPanel<\/span><\/h3>\n<p>On most cPanel servers (including dchost.com shared hosting plans), the public root of your main domain is <code>public_html\/<\/code>. For addon domains or subdomains, they usually have their own document root folders.<\/p>\n<ol>\n<li>Log in to cPanel.<\/li>\n<li>Open &#8220;File Manager&#8221;.<\/li>\n<li>Navigate to the document root of your domain (e.g. <code>\/home\/username\/public_html\/<\/code>).<\/li>\n<li>Create a new file named <code>robots.txt<\/code> at the root level (same folder as <code>index.php<\/code> or <code>index.html<\/code>).<\/li>\n<li>Paste your rules and save.<\/li>\n<li>Ensure your sitemap (static file or CMS\u2011generated) is accessible, e.g. <code>\/sitemap.xml<\/code> or <code>\/sitemap_index.xml<\/code>.<\/li>\n<\/ol>\n<p>Then check in your browser:<\/p>\n<ul>\n<li><code>https:\/\/yourdomain.com\/robots.txt<\/code><\/li>\n<li><code>https:\/\/yourdomain.com\/sitemap.xml<\/code> (or your actual sitemap index URL)<\/li>\n<\/ul>\n<h3><span id=\"On_DirectAdmin_or_Plesk\">On DirectAdmin or Plesk<\/span><\/h3>\n<p>The logic is similar: find the document root (for example <code>domains\/yourdomain.com\/public_html\/<\/code> on DirectAdmin), then create\/edit <code>robots.txt<\/code> there. The file must always live at the <strong>top\u2011level path<\/strong> for each hostname you want to control.<\/p>\n<h3><span id=\"On_a_VPS_or_dedicated_server_with_Apache\">On a VPS or <a href=\"https:\/\/www.dchost.com\/dedicated-server\">dedicated server<\/a> with Apache<\/span><\/h3>\n<p>If you host your site on a VPS or dedicated server from dchost.com using Apache, robots.txt is still just a text file in your DocumentRoot. For a typical VirtualHost:<\/p>\n<pre class=\"language-apache line-numbers\"><code class=\"language-apache\">&lt;VirtualHost *:80&gt;\n    ServerName example.com\n    DocumentRoot \/var\/www\/example.com\/public\n    ...\n&lt;\/VirtualHost&gt;\n<\/code><\/pre>\n<p>Place <code>robots.txt<\/code> and your static <code>sitemap.xml<\/code> in <code>\/var\/www\/example.com\/public\/<\/code>. Apache will serve them automatically unless you have rewrite rules that interfere. If you use complex rewrite rules (e.g. Laravel, Symfony, headless frontends), add explicit exceptions:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">RewriteEngine On\n\nRewriteRule ^robots.txt$ - [L]\nRewriteRule ^sitemap(_index)?.xml$ - [L]\n\n# Your existing front-controller rule here\nRewriteCond %{REQUEST_FILENAME} !-f\nRewriteCond %{REQUEST_FILENAME} !-d\nRewriteRule ^ index.php [L]\n<\/code><\/pre>\n<h3><span id=\"On_a_VPS_or_dedicated_server_with_Nginx\">On a VPS or dedicated server with Nginx<\/span><\/h3>\n<p>With Nginx, you typically define a <code>server<\/code> block per domain:<\/p>\n<pre class=\"language-nginx line-numbers\"><code class=\"language-nginx\">server {\n    server_name example.com;\n    root \/var\/www\/example.com\/public;\n\n    location = \/robots.txt { }\n    location = \/sitemap.xml { }\n\n    location \/ {\n        try_files $uri $uri\/ \/index.php?$query_string;\n    }\n\n    # PHP-FPM, SSL, etc.\n}\n<\/code><\/pre>\n<p>The <code>location = \/robots.txt { }<\/code> line tells Nginx to serve the static file directly from the root. If you have a dynamic sitemap endpoint (for example <code>\/sitemap.xml<\/code> served by PHP), make sure the location passes the request to PHP\u2011FPM instead of just looking for a static file.<\/p>\n<h3><span id=\"Multiple_domains_and_addon_domains_on_the_same_hosting_account\">Multiple domains and addon domains on the same hosting account<\/span><\/h3>\n<p>Each <strong>hostname<\/strong> is treated separately by crawlers. So:<\/p>\n<ul>\n<li><code>example.com<\/code> has its own <code>https:\/\/example.com\/robots.txt<\/code><\/li>\n<li><code>blog.example.com<\/code> has its own <code>https:\/\/blog.example.com\/robots.txt<\/code><\/li>\n<\/ul>\n<p>If you run many sites on a single account (common for agencies and resellers), keep a small checklist for each new domain: DNS, SSL, robots.txt, sitemap.xml. Our guide on <a href=\"https:\/\/www.dchost.com\/blog\/en\/paylasimli-ve-reseller-hostingde-coklu-web-sitesi-yonetimi\/\">managing multiple websites on shared and reseller hosting<\/a> has more operational tips that fit nicely with this.<\/p>\n<h2><span id=\"Step_5_Advanced_Scenarios_Multilingual_Subdomains_and_Staging\">Step 5 \u2013 Advanced Scenarios: Multilingual, Subdomains and Staging<\/span><\/h2>\n<h3><span id=\"Subdomain_vs_subdirectory_and_its_impact_on_robotssitemaps\">Subdomain vs subdirectory and its impact on robots\/sitemaps<\/span><\/h3>\n<p>Your domain architecture (blog\/store\/languages) directly affects how you design robots.txt and sitemap.xml. If you are still deciding, read our detailed comparison of <a href=\"https:\/\/www.dchost.com\/blog\/en\/subdomain-mi-alt-dizin-mi-blog-magaza-ve-dil-surumleri-icin-seo-ve-hosting-karsilastirmasi\/\">subdomain vs subdirectory for SEO and hosting<\/a>.<\/p>\n<ul>\n<li><strong>Languages in subdirectories<\/strong> (e.g. <code>\/en\/<\/code>, <code>\/de\/<\/code>): one robots.txt and one (or multiple) sitemaps on the main domain. Sitemaps can separate languages but are all referenced from the same sitemap index.<\/li>\n<li><strong>Languages on subdomains<\/strong> (e.g. <code>en.example.com<\/code>, <code>de.example.com<\/code>): each subdomain gets its own robots.txt and sitemap set.<\/li>\n<\/ul>\n<p>For international SEO, make sure your sitemap structure matches your hreflang strategy, and that no language versions are accidentally disallowed.<\/p>\n<h3><span id=\"Staging_sites_and_test_environments\">Staging sites and test environments<\/span><\/h3>\n<p>We regularly see staging environments accidentally indexed by search engines because:<\/p>\n<ul>\n<li>The staging robots.txt was copied from production<\/li>\n<li>The staging site used a different subdomain but shared the same content and links<\/li>\n<\/ul>\n<p>On staging, the safest combo is:<\/p>\n<ul>\n<li>HTTP auth (username\/password) or IP restriction at the web server level<\/li>\n<li><code>Disallow: \/<\/code> in <code>robots.txt<\/code><\/li>\n<li>No sitemaps exposed publicly<\/li>\n<\/ul>\n<p>When you clone staging to production, always double\u2011check that you <strong>remove<\/strong> the <code>Disallow: \/<\/code> rule and point sitemaps to the correct domain before going live.<\/p>\n<h3><span id=\"Handling_multiple_sitemaps_across_domains\">Handling multiple sitemaps across domains<\/span><\/h3>\n<p>By default, a sitemap should only list URLs from its own host. Cross\u2011domain sitemaps (one sitemap that lists URLs from multiple domains) are supported in some cases, but they require verification of each host in Google Search Console and careful configuration. For most small and medium sites, it is simpler and cleaner to keep sitemaps per domain\/subdomain.<\/p>\n<h2><span id=\"Step_6_Testing_Monitoring_and_Avoiding_Silent_SEO_Disasters\">Step 6 \u2013 Testing, Monitoring and Avoiding Silent SEO Disasters<\/span><\/h2>\n<h3><span id=\"Use_Google_Search_Console_and_Bing_Webmaster_Tools\">Use Google Search Console and Bing Webmaster Tools<\/span><\/h3>\n<p>After setting up robots.txt and sitemap.xml:<\/p>\n<ol>\n<li>Verify your domain in Google Search Console and Bing Webmaster Tools.<\/li>\n<li>Submit your sitemap URL(s) in each tool.<\/li>\n<li>Use the &#8220;URL Inspection&#8221; (GSC) and &#8220;Fetch as Bingbot&#8221;\u2011style tools to test specific pages.<\/li>\n<li>Check for indexing coverage issues, blocked resources and unexpected noindex directives.<\/li>\n<\/ol>\n<h3><span id=\"Test_robotstxt_with_crawler_tools\">Test robots.txt with crawler tools<\/span><\/h3>\n<p>Many SEO tools can simulate how robots.txt rules apply to specific URLs. Even without those tools, you can:<\/p>\n<ul>\n<li>Keep a simple <strong>allow\/deny matrix<\/strong> in a spreadsheet for critical paths<\/li>\n<li>Spot\u2011check with browser: request <code>\/robots.txt<\/code> and confirm the live version matches your latest changes<\/li>\n<\/ul>\n<h3><span id=\"Check_your_server_logs_for_crawler_behavior\">Check your server logs for crawler behavior<\/span><\/h3>\n<p>Server logs still provide the most precise view of how bots interact with your site. On VPS or dedicated servers, access logs will reveal:<\/p>\n<ul>\n<li>Which bots hit you most frequently<\/li>\n<li>Which paths they crawl most<\/li>\n<li>Whether they obey your robots.txt rules<\/li>\n<\/ul>\n<p>If you are not yet comfortable reading logs, our guide on <a href=\"https:\/\/www.dchost.com\/blog\/en\/hosting-sunucu-loglarini-okumayi-ogrenin-apache-ve-nginx-ile-4xx-5xx-hatalarini-teshis-rehberi\/\">how to read web server logs to diagnose 4xx\u20135xx errors on Apache and Nginx<\/a> is a great starting point. The same techniques help you understand crawler patterns and detect abnormal activity.<\/p>\n<h3><span id=\"Common_silent_issues_to_monitor\">Common silent issues to monitor<\/span><\/h3>\n<ul>\n<li>Copied robots.txt from another project that still blocks important paths (e.g. an old <code>Disallow: \/shop\/<\/code> rule kept by mistake)<\/li>\n<li>Sitemaps listing URLs that now redirect or return errors after a redesign<\/li>\n<li>Changes in site structure (new subdirectories, new hostname) without updating sitemap &amp; robots.txt references<\/li>\n<li>Moving from <code>www<\/code> to non\u2011www (or vice versa) without aligning sitemaps and canonical URLs<\/li>\n<\/ul>\n<h2><span id=\"Hosting_and_Infrastructure_Considerations\">Hosting and Infrastructure Considerations<\/span><\/h2>\n<p>Because we live on the hosting side every day at dchost.com, we also see the infrastructure\u2011level details that impact robots.txt and sitemap behavior.<\/p>\n<h3><span id=\"Performance_sitemaps_on_busy_stores_and_portals\">Performance: sitemaps on busy stores and portals<\/span><\/h3>\n<p>On high\u2011traffic e\u2011commerce or content sites, sitemap generation can become a noticeable load if it runs dynamically on every request. Good practices:<\/p>\n<ul>\n<li>Cache sitemap output in a file or object cache (Redis, for example) and refresh only when needed.<\/li>\n<li>Split gigantic sitemaps into logical pieces: products, categories, blog, static pages.<\/li>\n<li>Consider offloading heavy reports and logs; combine with a solid <a href=\"https:\/\/www.dchost.com\/blog\/en\/yedekleme-stratejisi-nasil-planlanir-blog-e-ticaret-ve-saas-siteleri-icin-rpo-rto-rehberi\/\">backup and retention strategy<\/a> so your main hosting remains lean.<\/li>\n<\/ul>\n<h3><span id=\"SSL_redirects_and_canonical_domains\">SSL, redirects and canonical domains<\/span><\/h3>\n<p>Your canonical domain (with or without <code>www<\/code>, HTTP vs HTTPS) should be consistent across:<\/p>\n<ul>\n<li>Robots.txt<\/li>\n<li>Sitemap URLs<\/li>\n<li>Canonical tags<\/li>\n<li>Redirect rules<\/li>\n<\/ul>\n<p>For example, if you use HTTPS and no <code>www<\/code>:<\/p>\n<ul>\n<li>Robots.txt should say <code>Sitemap: https:\/\/example.com\/sitemap.xml<\/code><\/li>\n<li>All sitemap URLs should start with <code>https:\/\/example.com\/...<\/code><\/li>\n<li>HTTP and <code>www.example.com<\/code> should 301 redirect to <code>https:\/\/example.com<\/code><\/li>\n<\/ul>\n<p>Misaligned configurations can cause duplicates, wasted crawl budget and diluted link equity.<\/p>\n<h2><span id=\"Putting_It_All_Together_A_Practical_Checklist\">Putting It All Together: A Practical Checklist<\/span><\/h2>\n<p>When we help customers at dchost.com set up a new site or migrate to a VPS\/dedicated server, we usually run through this short checklist:<\/p>\n<ol>\n<li><strong>Decide structure<\/strong>: domain, subdomain vs subdirectory, language strategy.<\/li>\n<li><strong>Confirm canonical URL<\/strong>: HTTP\/HTTPS, www vs non\u2011www, redirect policy.<\/li>\n<li><strong>Generate sitemap(s)<\/strong>: via CMS plugin, build script or custom code.<\/li>\n<li><strong>Upload robots.txt<\/strong> to the document root of each domain\/subdomain.<\/li>\n<li><strong>Reference your sitemap<\/strong> inside robots.txt with full HTTPS URLs.<\/li>\n<li><strong>Test live URLs<\/strong>: manually open <code>\/robots.txt<\/code> and <code>\/sitemap.xml<\/code> in the browser.<\/li>\n<li><strong>Submit sitemaps<\/strong> in Google Search Console and Bing Webmaster Tools.<\/li>\n<li><strong>Monitor logs and coverage<\/strong> for the first few weeks; adjust rules only when you see real patterns.<\/li>\n<\/ol>\n<h2><span id=\"Conclusion_Small_Files_Big_SEO_and_Hosting_Impact\">Conclusion: Small Files, Big SEO and Hosting Impact<\/span><\/h2>\n<p>robots.txt and sitemap.xml are easy to overlook, especially when you are busy with design, content, payments and integrations. But from our experience at dchost.com, these two files often make the difference between clean, efficient crawling and months of confusing SEO issues. The good news is that once you set them up thoughtfully\u2014and match them with your domain architecture, redirects and hosting configuration\u2014they rarely need more than light maintenance.<\/p>\n<p>If you are launching a new site or planning a migration to a VPS, dedicated server or colocation, we are happy to help you align robots.txt, sitemaps and hosting\u2011side SEO basics from day one. Combine this guide with our <a href=\"https:\/\/www.dchost.com\/blog\/en\/yeni-web-sitesi-yayina-alirken-hosting-tarafinda-seo-ve-performans-kontrol-listesi\/\">new website launch checklist<\/a> and our article on <a href=\"https:\/\/www.dchost.com\/blog\/en\/isletmeniz-icin-seo-uyumlu-alan-adi-secimi\/\">choosing an SEO\u2011friendly domain name<\/a>, and you will have a solid technical foundation before your first visitor arrives. And if you are unsure how to adapt these examples to your specific shared hosting, VPS or dedicated setup at dchost.com, our support team can review your configuration and suggest a clean, safe robots.txt and sitemap.xml layout tailored to your project.<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>\u0130&ccedil;indekiler1 Why robots.txt and sitemap.xml Matter for Your Site2 robots.txt and sitemap.xml in Plain Language2.1 What is robots.txt?2.2 What is sitemap.xml?3 Step 1 \u2013 Decide Your Crawl Strategy Before Writing a Single Rule3.1 Pages and sections you usually want crawled3.2 URLs that are often safe (or recommended) to block3.3 Critical warning: robots.txt vs HTTP authentication4 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3398,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26],"tags":[],"class_list":["post-3397","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-teknoloji"],"_links":{"self":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/3397","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/comments?post=3397"}],"version-history":[{"count":0,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/3397\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media\/3398"}],"wp:attachment":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media?parent=3397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/categories?post=3397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/tags?post=3397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}