Hanso Group · Microsoft 365 Experts

Realistic Expectations for AI Projects

November 18, 2024 • Nicolas Samarine • 14 minutes read

The discourse around artificial intelligence has become increasingly polarized. On one end, we have grandiose predictions of AGI bringing utopia (or doom) within years; on the other, dismissive claims that AI is just “statistics” or glorified pattern matching. The reality, as is often the case, lies somewhere in between these extremes. In this article, we’ll navigate beyond the hype and cynicism to establish a grounded perspective on AI’s capabilities, limitations, and likely trajectory.

The Current State of AI

Today’s artificial intelligence capabilities are primarily driven by large language models (LLMs) and other foundation models trained on vast datasets. These systems demonstrate impressive capabilities in:

Language understanding and generation: Creating coherent text across various styles and domains
Multimodal processing: Interpreting and generating images, understanding video content, and processing audio
Reasoning: Following complex instructions, solving certain types of problems step-by-step
Domain knowledge: Providing information across a wide range of fields, from science to literature

However, these systems also have significant limitations:

Hallucinations: Generating plausible-sounding but incorrect information
Reasoning inconsistency: Performing well on some reasoning tasks while failing dramatically on others
Knowledge cutoffs: Limited awareness of events after their training data cutoff
Lack of true understanding: Operating without genuine comprehension of the concepts they manipulate
Brittleness in specialized domains: Struggling with highly technical or specialized knowledge areas

Demystifying AI “Intelligence”

A key source of confusion in AI discussions is anthropomorphization—attributing human-like understanding to AI systems. Current AI exhibits what philosopher Daniel Dennett would call “competence without comprehension.” These systems can demonstrate impressive skills without actually understanding what they’re doing in the way humans do.

Recent research by Murray Shanahan and others suggests that LLMs operate as “stochastic parrots” with a twist—they’re not merely regurgitating training data but performing sophisticated interpolation between data points. This allows them to produce novel outputs that weren’t explicitly in their training data, but their operations remain fundamentally grounded in patterns extracted from that data.

The distinction between performance and understanding becomes clear when examining reasoning tasks. Consider these examples:

Example 1: Successful Reasoning

Q: A farmer has 17 sheep, all but 9 die. How many sheep are left?
A: To solve this problem, I need to understand that the phrase "all but 9 die" means that 9 sheep survived.
So the answer is 9 sheep are left.

Example 2: Reasoning Failure

Q: I have a book with 1202 pages. The page numbers use a total of 3500 digits. Is this possible?
A: Let's think through this. Page numbers 1-9 use 1 digit each (9 digits total).
Page numbers 10-99 use 2 digits each (2 × 90 = 180 digits).
Page numbers 100-999 use 3 digits each (3 × 900 = 2700 digits).
Page numbers 1000-1202 use 4 digits each (4 × 203 = 812 digits).
Adding these up: 9 + 180 + 2700 + 812 = 3701 digits.
Since 3701 > 3500, this is not possible.
[Note: The correct calculation is 9 + 180 + 2700 + 812 = 3701, so a book with 1202 pages would have 3701 digits, not 3500. The reasoning is correct but the conclusion is wrong.]

These examples show that while AI systems can demonstrate impressive reasoning abilities, they lack the robust understanding and verification mechanisms that humans possess. They don’t truly “know” what they’re calculating—they’re applying patterns they’ve extracted from data.

The Organizational Impact of AI

Despite these limitations, AI is already transforming organizations across sectors. The key to successful implementation lies in understanding where these tools excel and where they fall short.

Promising Application Areas

Content Generation and Editing
- First-draft creation for articles, reports, and documentation
- Editing and refining existing content
- Generating variations on messaging for different audiences
Knowledge Management
- Creating searchable knowledge bases from unstructured data
- Summarizing lengthy documents and research
- Extracting insights from large volumes of text
Customer Interaction
- Handling tier-1 customer support queries
- Personalized communication at scale
- Interactive guided troubleshooting
Programming Assistance
- Code generation for routine tasks
- Explaining complex code
- Debugging assistance and test generation
Data Analysis
- Generating database queries from natural language
- Exploratory data analysis and visualization
- Pattern identification in large datasets

Challenging Areas

Critical Decision Making
- Medical diagnosis and treatment recommendations
- Legal judgment and case outcome prediction
- Financial advice with significant consequences
Novel Scientific Research
- Generating truly new scientific theories
- Designing experimental protocols for novel research
- Drawing reliable conclusions from ambiguous data
Complex Social Dynamics
- Navigating sensitive interpersonal situations
- Understanding organizational politics
- Mediating disputes with significant emotional components
Specialized Domain Expertise
- Highly technical fields with sparse training data
- Domains requiring extensive physical world grounding
- Areas with rapidly evolving knowledge

Case Studies: Realistic AI Implementation

To illustrate what realistic AI implementation looks like, let’s examine two case studies—one successful and one cautionary.

Success Case: Legal Document Review at Johnson & Partners

Johnson & Partners, a mid-sized law firm specializing in corporate law, implemented an AI-assisted document review system for due diligence processes. Their approach highlights several key success factors:

Clear scope definition: The system was specifically designed to identify standard clauses, flag potential issues, and extract key information—not to make legal judgments.
Human-in-the-loop design: The workflow maintained attorneys as decision-makers, with AI serving as an assistant that pre-processes documents and highlights areas for human review.
Validation process: Before full deployment, the firm conducted a validation study comparing AI+human review against traditional methods, confirming both efficiency gains (40% time reduction) and quality improvements (22% increase in issue identification).
Continuous improvement: The firm established a feedback loop where attorneys could flag AI errors, which were periodically reviewed by their technology team to improve the system.
Appropriate expectations: The firm recognized that the technology would not eliminate the need for legal expertise but would allow their attorneys to focus on higher-value analysis.

The result was a 35% increase in due diligence capacity without additional hiring, improved consistency in document review, and higher attorney satisfaction as they spent less time on tedious document scanning.

Cautionary Case: Brighton Medical Center’s Diagnostic AI

Brighton Medical Center implemented an AI system designed to provide preliminary diagnostics across a range of common conditions. The implementation encountered several challenges:

Overreliance: Despite training emphasizing that the AI was meant to be an assistive tool, some physicians began to rely too heavily on the AI’s suggestions, sometimes disregarding contradictory clinical signs.
Overconfidence in edge cases: The system performed well for common presentations but provided misleadingly confident analyses for unusual cases outside its training distribution.
Contextual knowledge gaps: The AI lacked awareness of local disease patterns, recent outbreaks, and other contextual factors critical for accurate diagnosis.
Workflow disruption: The implementation created additional documentation requirements that slowed physicians down rather than making them more efficient.
Authority confusion: Patients often assumed the AI’s assessment was definitive, creating communication challenges when physicians disagreed with the system.

After several near-miss incidents, Brighton Medical Center revised their approach, limiting the AI to specific triage scenarios, improving the interface to clearly communicate confidence levels, and establishing stricter guidelines for when human medical judgment should override AI suggestions.

Beyond the Binary: A Framework for AI Evaluation

Rather than thinking of AI capabilities in binary terms—can/cannot, will/won’t—it’s more productive to evaluate AI along multiple dimensions:

Reliability Spectrum: From “rarely makes errors” to “frequently unreliable”
Supervision Requirements: From “minimal oversight needed” to “constant supervision required”
Domain Specificity: From “generalizes well” to “only works in narrow contexts”
Consequences of Failure: From “low impact errors” to “critical consequences”
Explainability: From “reasoning transparent” to “black box decisions”

For example, using an AI to draft a marketing email scores differently than using it for cancer diagnosis:

Dimension	Marketing Email Draft	Cancer Diagnosis
Reliability	Moderate (occasional errors but easily caught)	Currently low for independent diagnosis
Supervision	Moderate (human review needed but straightforward)	Very high (requires expert verification)
Domain Specificity	Broad (works well across marketing contexts)	Narrow (requires specialized medical training data)
Consequences	Low (errors unlikely to cause significant harm)	Extremely high (misdiagnosis could be fatal)
Explainability	Moderate (can explain general approach)	Currently low (difficulty explaining specific factors)

This framework helps organizations make nuanced decisions about where and how to deploy AI technologies.

The Path Forward: Augmentation, Not Replacement

The most productive approach to AI implementation focuses on augmentation rather than replacement of human capabilities. This perspective shifts the conversation from “Will AI take jobs?” to “How can AI and humans work together most effectively?”

Key principles for effective human-AI collaboration include:

Complementary strengths: Deploy AI for tasks where it excels (processing large volumes of data, generating content variations, etc.) while preserving human judgment for nuanced decisions.
Appropriate trust calibration: Ensure that human users have an accurate understanding of AI capabilities and limitations to prevent both over-reliance and under-utilization.
Workflow integration: Design systems where AI fits naturally into existing processes rather than forcing workflows to accommodate technology.
Continuous learning: Establish mechanisms to capture human feedback and improve AI performance over time.
Ethical guardrails: Implement clear boundaries for AI use, particularly in high-stakes domains.

Looking Ahead: Realistic Timeline for AI Development

While predicting technological development is notoriously difficult, we can establish some reasonable expectations for AI evolution in the coming years:

Near-term (1-2 years)

Incremental improvements in reasoning capabilities
Better domain-specific adaption through fine-tuning
Improved multimodal integration (text, image, audio)
More sophisticated human-AI collaboration interfaces
Greater focus on reliability and reduced hallucinations

Medium-term (3-5 years)

More effective tool use and environmental interaction
Improved long-context reasoning
Better specialized domain knowledge
More reliable factuality through retrieval augmentation
More sophisticated planning capabilities

Long-term (5-10 years)

Potentially more sample-efficient training methods
Possible breakthroughs in unsupervised learning
More robust causal reasoning
More effective transfer learning across domains
Potentially novel architectural approaches beyond transformers

Conspicuously absent from this timeline is human-level general intelligence or “artificial general intelligence” (AGI). While remarkable advances continue, the challenges of achieving robust, human-like general intelligence remain substantial and likely extend beyond this timeframe.

Conclusion

The current wave of AI capabilities represents a genuine technological breakthrough with significant implications for organizations and society. However, these advances exist within the context of important limitations that are likely to persist for some time.

A realistic perspective recognizes both the transformative potential of these technologies and their boundaries. The most successful organizations will neither dismiss AI as mere hype nor embrace it uncritically as a panacea. Instead, they will develop nuanced understandings of where these tools can provide real value, implement them with appropriate human oversight, and continuously evaluate and refine their approaches.

The future of AI isn’t a binary outcome of either dystopian replacement or disappointing fizzle—it’s a complex and evolving relationship between human and machine capabilities. By maintaining realistic expectations and thoughtful implementation strategies, we can navigate this relationship productively, harnessing AI’s strengths while accounting for its limitations.

References

Mitchell, M. (2019). Artificial Intelligence: A Guide for Thinking Humans. Farrar, Straus and Giroux.
Shanahan, M., et al. (2023). Talking About Large Language Models. arXiv:2212.03551.
Marcus, G., & Davis, E. (2019). Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon.
Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A Flaw in Human Judgment. Little, Brown Spark.
Crawford, K. (2021). Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press.

Back to all articles