Strings in Rust
Working with strings in Rust is quite different from JavaScript. While JavaScript has a single string type with built-in Unicode support, Rust has multiple string types with explicit UTF-8 encoding. This guide will help you understand the differences and effectively work with strings in Rust.
String Types in Rust
Rust has two main string types:
String
: A growable, heap-allocated UTF-8 encoded string (similar to JavaScript’sString
)&str
: A string slice, which is a reference to a UTF-8 encoded string (often stored elsewhere)
Think of String
as owning its data (like a vec![u8]
with UTF-8 constraints), and &str
as borrowing string data.
Creating Strings
Let’s compare how we create strings in JavaScript versus Rust:
// JavaScript string creationlet greeting = "Hello, world!";let name = 'Alice';let message = `Hello, ${name}!`;
// Creating from other valueslet numAsString = String(42);let boolAsString = String(true);
In Rust:
// String literals (these are &str)let greeting = "Hello, world!";
// Creating a Stringlet owned_greeting = String::from("Hello, world!");let also_owned = "Hello, world!".to_string();let empty = String::new();
// String interpolation equivalentlet name = "Alice";let message = format!("Hello, {}!", name);
// Creating from other typeslet num_as_string = 42.to_string();let bool_as_string = true.to_string();
Key differences:
- Rust distinguishes between string literals (
&str
) and owned strings (String
) - No template literal syntax in Rust; use
format!
macro instead - Single quotes in Rust are for
char
types, not strings - To convert values to strings, use the
.to_string()
method orformat!
macro
String vs &str: When to Use Each
Here’s a simple rule of thumb:
- Use
&str
for function parameters when you only need to read the string - Use
String
when you need to own or modify the string
// Takes a string slice - function doesn't need ownershipfn print_info(info: &str) { println!("Info: {}", info);}
// Returns an owned String - function creates a new stringfn generate_greeting(name: &str) -> String { format!("Hello, {}!", name)}
fn main() { let name = "Alice";
// Both String and &str can be passed where &str is needed print_info(name);
let owned_name = String::from("Bob"); print_info(&owned_name); // &String coerces to &str
// Generate and own a new String let greeting = generate_greeting(name); println!("{}", greeting);}
String Concatenation and Building
JavaScript makes string concatenation easy:
// JavaScript concatenationlet greeting = "Hello, ";let name = "world";let message = greeting + name + "!";
// Or with +=let fullMessage = "Hello";fullMessage += ", ";fullMessage += "world!";
Rust offers several approaches:
// Using + operator (consumes the first string)let greeting = String::from("Hello, ");let name = "world";let message = greeting + name + "!";// Note: greeting is moved/consumed here and can't be used again
// Using format! macro (doesn't consume any strings)let greeting = String::from("Hello, ");let name = "world";let message = format!("{}{}{}", greeting, name, "!");// greeting and name are still valid here
// Using += via push_str() (modifies the original)let mut full_message = String::from("Hello");full_message.push_str(", ");full_message.push_str("world!");
// Adding a single character with push()let mut hi = String::from("Hi");hi.push('!');
Key differences:
- The
+
operator in Rust consumes the first string - The second argument for
+
must be a string slice (&str
), not an ownedString
- For multiple concatenations,
format!
is often clearer and more efficient - To mutate a string, you need to declare it as
mut
String Length and Size
JavaScript’s length
property gives the number of UTF-16 code units:
let text = "Hello";console.log(text.length); // 5
let emoji = "👋";console.log(emoji.length); // 2 (because emoji uses multiple UTF-16 code units)
Rust provides different methods for different concepts of “length”:
let text = "Hello";println!("{}", text.len()); // 5 bytes
let emoji = "👋";println!("{}", emoji.len()); // 4 bytesprintln!("{}", emoji.chars().count()); // 1 Unicode scalar value
Key differences:
- Rust’s
.len()
gives the size in bytes, not characters - Use
.chars().count()
to count Unicode scalar values - Rust strings are always UTF-8 encoded
Accessing Characters
JavaScript allows indexing strings to get characters:
let greeting = "Hello";let firstChar = greeting[0]; // "H"
// Caution with Unicode!let emoji = "👋";console.log(emoji[0]); // Gives a surrogate half, not the actual character
// Better for Unicode:let firstEmoji = Array.from(emoji)[0]; // "👋"
Rust doesn’t allow direct indexing of strings, for good reason:
let greeting = "Hello";// let first_char = greeting[0]; // This doesn't compile in Rust!
// Instead, use one of these approaches:let first_byte = greeting.as_bytes()[0]; // 72 (ASCII for 'H')
// For the first character:if let Some(first_char) = greeting.chars().next() { println!("First char: {}", first_char); // "H"}
// To get a character at a specific position (inefficient for large indices):let third_char = greeting.chars().nth(2).unwrap(); // 'l'
Key differences:
- Rust doesn’t allow direct string indexing because UTF-8 characters can span multiple bytes
- Access individual bytes with
.as_bytes()[i]
- Access Unicode characters with
.chars()
iterator
Slicing Strings
JavaScript uses the slice
method:
let greeting = "Hello, world!";let hello = greeting.slice(0, 5); // "Hello"
Rust uses range syntax for slices, but requires care with UTF-8:
let greeting = "Hello, world!";let hello = &greeting[0..5]; // "Hello"
// Be careful! This must be at valid UTF-8 character boundaries// let will_panic = &"👋 hello"[0..2]; // PANICS: index 2 is not a character boundary
Key differences:
- Rust slices must be at valid UTF-8 character boundaries
- Indexing a non-boundary results in a runtime panic, not a silent error
- For safety with Unicode, consider using character-based methods
String Iteration
JavaScript iteration methods:
let greeting = "Hello";
// By code units (caution with Unicode!)for (let i = 0; i < greeting.length; i++) { console.log(greeting[i]);}
// By Unicode code points (better for international text)for (const char of greeting) { console.log(char);}
Rust iteration methods:
let greeting = "Hello";
// By bytesfor b in greeting.bytes() { println!("{}", b); // Prints byte values (72, 101, 108, 108, 111)}
// By Unicode scalar values (chars)for c in greeting.chars() { println!("{}", c); // Prints characters (H, e, l, l, o)}
// With indices (byte positions, not character positions)for (i, c) in greeting.char_indices() { println!("Character '{}' at byte position {}", c, i);}
String Conversion and Casting
JavaScript automatic conversions:
let num = 42;let boolVal = true;let str1 = "Value: " + num; // Converts num to stringlet str2 = "Is true: " + boolVal; // Converts bool to string
Rust explicit conversions:
let num = 42;let bool_val = true;
// Convert to String (owned)let str1 = format!("Value: {}", num);let str2 = format!("Is true: {}", bool_val);
// Convert from String to other typeslet num_str = "42";let parsed_num: i32 = num_str.parse().expect("Not a number!");
// More robust error handlingmatch "42a".parse::<i32>() { Ok(n) => println!("Successfully parsed: {}", n), Err(e) => println!("Failed to parse: {}", e),}
UTF-8 and Unicode Handling
JavaScript treats strings as sequences of UTF-16 code units:
let face = "😊";console.log(face.length); // 2 (UTF-16 surrogate pair)
Rust treats strings as UTF-8 encoded bytes:
let face = "😊";println!("Bytes: {}", face.len()); // 4 bytesprintln!("Characters: {}", face.chars().count()); // 1 character
// Common operations with Unicodelet text = "héllo";println!("Uppercase: {}", text.to_uppercase());println!("Lowercase: {}", text.to_lowercase());
// Normalizing Unicode (requires the unicode-normalization crate)// use unicode_normalization::UnicodeNormalization;// let normalized = text.nfc().collect::<String>();
String Searching and Replacement
JavaScript offers various search methods:
let sentence = "The quick brown fox";console.log(sentence.includes("quick")); // trueconsole.log(sentence.startsWith("The")); // trueconsole.log(sentence.endsWith("fox")); // trueconsole.log(sentence.indexOf("brown")); // 10
// Replacementlet replaced = sentence.replace("quick", "swift");
Rust has similar methods:
let sentence = "The quick brown fox";println!("{}", sentence.contains("quick")); // trueprintln!("{}", sentence.starts_with("The")); // trueprintln!("{}", sentence.ends_with("fox")); // trueprintln!("{}", sentence.find("brown").unwrap_or(0)); // 10
// Replacement (creates a new string)let replaced = sentence.replace("quick", "swift");
Splitting and Joining Strings
JavaScript:
// Splitlet comma_str = "apple,banana,orange";let fruits = comma_str.split(","); // ["apple", "banana", "orange"]
// Joinlet joined = fruits.join("-"); // "apple-banana-orange"
Rust:
// Splitlet comma_str = "apple,banana,orange";let fruits: Vec<&str> = comma_str.split(',').collect();println!("{:?}", fruits); // ["apple", "banana", "orange"]
// Joinlet joined = fruits.join("-"); // "apple-banana-orange"
String Ownership and Manipulation
JavaScript strings are immutable, but this is handled behind the scenes:
let greeting = "Hello";greeting = greeting + ", world!"; // Creates a new string behind the scenes
Rust makes ownership explicit:
let greeting = String::from("Hello");// This consumes greeting and creates a new Stringlet new_greeting = greeting + ", world!";// greeting is no longer valid here!
// To keep the original, we could clone it:let greeting = String::from("Hello");let new_greeting = greeting.clone() + ", world!";// Both greeting and new_greeting are valid
Strings and Functions
JavaScript string methods create new strings:
let original = "Hello, world!";let upper = original.toUpperCase();let trimmed = original.trim();// original is unchanged
Rust string methods also create new strings:
let original = "Hello, world!";let upper = original.to_uppercase();let trimmed = original.trim();// original is unchanged
However, Rust’s ownership system makes function interactions different:
fn process(s: String) { // Takes ownership of the string println!("Processing: {}", s);} // s is dropped here
fn borrow_process(s: &str) { // Borrows the string println!("Processing: {}", s);} // just the reference is dropped, not the string
fn main() { let owned = String::from("hello");
// process(owned); // This would consume owned // println!("{}", owned); // Error! owned was moved
// Instead, use references: borrow_process(&owned); // Borrow, don't take ownership println!("{}", owned); // Still valid}
String Performance Considerations
For performance-critical code, be aware of these considerations:
// Pre-allocate capacity for efficiencylet mut s = String::with_capacity(100);for i in 0..100 { s.push_str(&i.to_string());}
// Avoid excessive cloninglet base = String::from("Hello");// Bad: clones on every iterationfor i in 0..1000 { let message = base.clone() + &i.to_string(); // Use message}
// Better: reuse a single Stringlet base = "Hello";let mut message = String::with_capacity(100);for i in 0..1000 { message.clear(); message.push_str(base); message.push_str(&i.to_string()); // Use message}
Advanced String Patterns
String Interning
JavaScript strings are automatically interned by the engine:
let a = "hello";let b = "hello";console.log(a === b); // true, they reference the same memory
In Rust, string literals are interned, but String
instances are not:
let a = "hello"; // &strlet b = "hello"; // &strprintln!("{}", std::ptr::eq(a, b)); // true, they reference the same memory
let owned_a = String::from("hello");let owned_b = String::from("hello");println!("{}", std::ptr::eq(&owned_a, &owned_b)); // false, different allocations
For applications needing string interning, consider crates like string-cache
.
Raw Strings
JavaScript:
let regex = "\\d+"; // Need to escape backslasheslet path = "C:\\Program Files\\App"; // Need to escape backslashes
Rust:
let regex = r"\d+"; // No need to escape backslasheslet path = r"C:\Program Files\App"; // No need to escape backslashes
// For strings with quotes:let json = r#"{"name": "John", "age": 30}"#;
// For strings that contain #"let complex = r##"A string with #"quotes"# inside"##;
Common String Operations Cheat Sheet
Operation | JavaScript | Rust |
---|---|---|
Create string | let s = "hello" | let s = "hello" (for &str ) let s = String::from("hello") (for String ) |
Concatenate | s1 + s2 | s1 + &s2 (consumes s1) format!("{}{}", s1, s2) |
Get length | s.length | s.len() (bytes) s.chars().count() (characters) |
Uppercase/Lowercase | s.toUpperCase() | s.to_uppercase() |
Trim whitespace | s.trim() | s.trim() |
Contains substring | s.includes("sub") | s.contains("sub") |
Replace | s.replace("old", "new") | s.replace("old", "new") |
Split | s.split(",") | s.split(",") |
Join | array.join(",") | vec.join(",") |
Summary
Strings in Rust are more complex than in JavaScript, but this complexity provides greater control and safety:
- Rust distinguishes between
String
(owned) and&str
(borrowed) types - Rust strings are always valid UTF-8, with explicit handling for Unicode
- Indexing and slicing have safety checks to prevent invalid UTF-8
- The ownership system affects how you pass strings to functions
- Explicit conversion methods replace JavaScript’s implicit conversions
These differences require some adjustment for JavaScript developers, but they prevent entire classes of string-related bugs and make international text handling more reliable.
Next Steps
Now that you’ve learned about Rust’s string types, let’s continue our exploration of Rust collections with Hash Maps. Hash maps in Rust are similar to JavaScript objects and Map collections, and we’ll explore their unique features in the next section.